now.data_loading.es.data_extractor module#

class now.data_loading.es.data_extractor.ElasticsearchExtractor(query, index, connection_str, connection_args=None)[source]#

Bases: object

For extracting documents from Elasticsearch into a docarray.DocumentArray dataset, this class implements an iterator which yields docarray.Document objects. To specify the data for extraction, one needs to provide an es query together with the index name and parameters to connect to the Elasticsearch instance. :type query: Dict :param query: Elasticsearch query in the form of a JSON string :type index: str :param index: Name of the ES index containing the documents to be extracted :type connection_str: str :param connection_str: A connection string for the ES instance. Usually, it

includes url, port, username, password, etc. Typically, it has the form: ‘https://{user_name}:{password}@{host}:{port}’

Parameters

connection_args (Optional[Dict]) – Dictionary with additional connection arguments, e.g., information about certificates

extract()[source]#

Returns extracted data as a DocumentArray where every Document contains chunks for each field. For Example: Document(chunks=[

Document(content=’hello’, modality=’text’, tags={‘field_name’: ‘title’}), Document(content=’https://bla.com/img.jpeg’, modality=’image’, tags={‘field_name’: ‘uris’}), ]

)

Return type

DocumentArray