Guide to loading your data#
When deploying your search app, you will be asked to select the input format of your data. This is the format of the data that you will be sending to the API. We currently support the following input formats for your custom input: DocumentArray, local path or S3 bucket. In this section, we will explain each of these options in more detail.
If you have loaded your data as a
DocumentArray, this option is perfect for you. In this case, you can simply provide
the name of your
DocumentArray as the input. For example, if you have a
cat_pictures, you can
cat_pictures as the input, which will automatically pull your dataset.
? How do you want to provide input? (format: https://docarray.jina.ai/) DocumentArray name (recommended) ? Please enter your DocumentArray name: cat_pictures
If you are using this
DocumentArray option, please make sure to model your data using the
@dataclass decorator from docarray.
This allows you to model nested and multi-modal data as follows:
from docarray import dataclass from docarray.typing import Image, Text @dataclass class Page: main_text: Text image: Image description: Text
In this dataclass model, we have a
Page document that has three fields:
You can instantiate the dataclass model with your actual data, and cast it to a
Document as follows:
from docarray import Document, DocumentArray page = Page( main_text='Hello world', image='apple.png', description='This is the image of an apple', ) doc = Document(page) da = DocumentArray([doc]) da.push(name="my_pages")
In the above example, we instantiate a
Page document with some dummy data, and then cast it to a
and finally add it to a
DocumentArray which we can push to Jina Cloud under the name “my_pages”.
This is the same name that we will use when deploying our search app with NOW.
More information about how to create and push your own
DocumentArray can be found here.
If you have your data stored locally, you can provide the path to the folder containing your data. The folder should contain all files that you want to index.
Here is an example of a folder structure for text-to-image search:
usr ├── data │ ├── images │ │ ├── 1.jpg │ │ ├── 2.jpg │ │ ├── 3.jpg │ │ ├── 4.jpg
In this case, the local path you should provide is
/usr/data/images, as follows:
? How do you want to provide input? (format: https://docarray.jina.ai/) Local folder ? Please enter your local path: /usr/data/images
AWS S3 bucket#
If you have your data stored in an AWS S3 bucket, you can provide the S3 URI, your
AWS access key ID and
AWS secret key.
Similar to the local folder option, the S3 bucket should contain all files that you want to index.
The only difference is that the S3 Uri should be in the following format:
Taking the example structure from above, the S3 URI would be
Here is an example of what your interaction may look like in the CLI:
? How do you want to provide input? (format: https://docarray.jina.ai/) S3 bucket ? Please enter the S3 URI to the folder: s3://<bucket-name>/<path-to-data> ? Please enter the AWS access key ID: <my-key-id> ? Please enter the AWS secret access key: <my-access-key> ? Please enter the AWS region: <my-region>
Supported File Formats#
Here is an overview of the supported file formats for each modality:
.txt(can also have a different extension, but has to be plain text)
.png, … (everything supported by
.mp3, … (everything supported by
Search and filter fields#
Once you have chosen your input type and ensured that your data is in the correct format, you will be asked to select
the fields from your dataset that you want to search and filter on. NOW will automatically detect these fields for you,
and list them for you to choose from. You can select only one field for searching. Here’s an example using the
? How do you want to provide input? (format: https://docarray.jina.ai/) Demo dataset ? What demo dataset do you want to use? 🦆 birds (≈12K docs) ? Please select the index fields: (<up>, <down> to move, <space> to select, <a> to toggle, <i> to invert) ○ label ❯○ image ? Please select the filter fields (<up>, <down> to move, <space> to select, <a> to toggle, <i> to invert) ❯◯ label
In the above commandline interaction, we have selected the
birds dataset, and we can see that the
fields are available for us to search on. We have selected the
image field for searching, and the
label field for
Now that you have selected your input format and the fields you want to search and filter on, you can move on to the next step, where you will be asked to choose a name for your search app, where to make the deployment (📍local or on ⛅️ Jina Cloud), and whether you want to secure your application.