now.finetuning.data_builder module#

class now.finetuning.data_builder.EncoderDataBuilder(name, methods, encoder_type)[source]#

Bases: object

Fine-tuning data generation for a specific encoder.

Parameters
  • name (str) – Name of the encoder.

  • methods (List[TrainDataGenerationConfig]) – List of methods to generate fine-tuning data.

  • encoder_type (str) – Type of the encoder, either single_model or multi_model.

build(es_data)[source]#

Generates query and target data based on method(s).

In case the encoder is single_model, we save generated targets and queries separately. If the encoder is multi_model, we generate query-target pairs.

Parameters

es_data (DocumentArray) – Data extracted from ES.

Return type

DocumentArray

Returns

DocumentArray of generated data.

property name#
property encoder_type#
class now.finetuning.data_builder.DataBuilder(dataset, config, num_workers=None, threads_per_worker=4)[source]#

Bases: object

Fine-tuning data generation for a specific task.

Parameters
  • dataset (DocumentArray) – A DocumentArray extracted from ES.

  • config (Task) – A TaskConfig object containing information about data generation methods.

  • num_workers (Optional[int]) – Number of workers for data generation.

  • threads_per_worker (Optional[int]) – Number of threads per worker.

build(to_hubble=False, data_dir=None)[source]#

Generates data from ES dataset based on the task configuration file.

You can also upload the generated data on hubble, or save it locally (or both).

Parameters
  • to_hubble (bool) – Uploads data to Hubble if True.

  • data_dir (Optional[str]) – Saves data locally in the given directory if it’s not None.

Return type

List[Tuple[DocumentArray, str]]

Returns

Generated data for each encoder.