now.common.preprocess module#

now.common.preprocess.preprocess_images(da)[source]#

Loads all documents into memory to thumbnail them.

Return type

DocumentArray

now.common.preprocess.preprocess_text(da, split_by_sentences=False)[source]#

If necessary, loads text for all documents. If asked for, splits documents by sentences.

In case split_by_sentences is set to True, generates sentence chunks: Before Document(chunks=[Document(text=’s1. s2. s3’)])

After Document(chunks=[Document(text=None, chunks=[Document(‘s1’), Document(‘s2’)..])])

Return type

DocumentArray

now.common.preprocess.preprocess_nested_docs(da, user_input)[source]#

Process a DocumentArray with Document`s that have `chunks of nested Document`s. It constructs `Document`s containg two chunks: one containing image data and another containing text data. Fields for indexing should be specified in the `UserInput.

Parameters
  • da (DocumentArray) – A DocumentArray containing nested chunks.

  • user_input (UserInput) – The configured user input.

Return type

DocumentArray

Returns

A DocumentArray with `Document`s containing text and image chunks.

now.common.preprocess.filter_data(documents, modalities)[source]#

Filters data based on modalities.

Parameters
  • documents (DocumentArray) – Documents to be filtered.

  • modalities (List[str]) – List of modalities that should be kept.

Return type

DocumentArray