now.executor.indexer.elastic.elastic_indexer module#

class now.executor.indexer.elastic.elastic_indexer.FieldEmbedding(encoder, embedding_size, fields)#

Bases: tuple

Create new instance of FieldEmbedding(encoder, embedding_size, fields)

property embedding_size#

Alias for field number 1

property encoder#

Alias for field number 0

property fields#

Alias for field number 2

class now.executor.indexer.elastic.elastic_indexer.NOWElasticIndexer(document_mappings, metric='cosine', limit=10, max_values_per_tag=10, es_mapping=None, es_config=None, *args, **kwargs)[source]#

Bases: NOWAuthExecutor

NOWElasticIndexer indexes Documents into an Elasticsearch instance. To do this, it uses helper functions from es_converter, converting documents to and from the accepted Elasticsearch format. It also uses the score calculation to combine the scores of different fields/encoders, allowing multi-modal documents to be indexed and searched with multi-modal queries.

Parameters
  • document_mappings (List[Tuple[str, int, List[str]]]) – list of FieldEmbedding tuples that define which encoder encodes which fields, and the embedding size of the encoder.

  • metric (str) – Distance metric type. Can be ‘euclidean’, ‘inner_product’, or ‘cosine’

  • limit (int) – Number of results to get for each query document in search

  • max_values_per_tag (int) – Maximum number of values per tag

  • es_mapping (Optional[Dict]) – Mapping for new index. If none is specified, this will be generated from document_mappings and metric.

  • hosts – host configuration of the Elasticsearch node or cluster

  • es_config (Optional[Dict[str, Any]]) – Elasticsearch cluster configuration object

  • index_name – ElasticSearch Index name used for the storage

generate_es_mapping()[source]#

Creates Elasticsearch mapping for the defined document fields.

Return type

Dict

index(**kwargs)#
search(**kwargs)#
update(**kwargs)#
list(**kwargs)#
count(**kwargs)#
delete(**kwargs)#
filters(**kwargs)#
curate(**kwargs)#
update_curated_ids(search_filter)[source]#
update_tags()[source]#

The indexer keeps track of which tags are indexed and what their possible values are, which is stored in self.filters_val_dict. This method queries the elasticsearch index for the current es_mapping to find the current tags on all indexed documents. It then queries elasticsearch for an aggregation of all values inside this field, and updates the self.filters_val_dict dictionary with tags as keys, and values as values in the dictionary.

now.executor.indexer.elastic.elastic_indexer.aggregate_embeddings(docs_map)[source]#

Aggregate embeddings of cc level to c level.

Parameters

docs_map (Dict[str, DocumentArray]) – a dictionary of `DocumentArray`s, where the key is the embedding space aka encoder name.