Adding Metadata to Object Storage Files for Search Filtering

You can add metadata to Object Storage files before syncing them to a vector store. Metadata helps improve retrieval by letting semantic and hybrid searches filter results by relevant attributes.

For example, you can add metadata such as publication year, title, topic, department, product area, or document type. After the files are synced, those metadata fields can be used to narrow search results to a specific content scope.

Metadata is optional. However, if you want metadata to be available for search filtering, add or update the metadata before you perform data sync. Metadata added after a sync isn’t included in that sync unless you run data sync again.

This topic applies to vector stores that sync unstructured data from Object Storage.

Note

Add or update metadata before you perform the data sync. Metadata added after a sync isn’t included in that sync.

How Metadata Works

Metadata is defined as key-value pairs. To use metadata with Object Storage files, you first define the metadata fields in a schema file. Then, you associate files in the bucket with metadata values.

For all Object Storage metadata methods, you must create a metadata schema file named _metadata_schema.json at the root level of the Object Storage bucket. The schema defines the metadata keys that the service can expect and the value type for each key.

If the _metadata_schema.json file doesn’t exist, metadata isn’t calculated for files in the bucket.

Each metadata field has a name and a type. Supported metadata types are:

  • integer
  • string
  • list_of_string
  • double

Workflow Overview

Use the following workflow to prepare metadata before syncing files to a vector store:

  1. In a text editor, create a metadata schema file named _metadata_schema.json.
  2. Define the metadata fields and value types in JSON format.
  3. Upload _metadata_schema.json to the root level of the Object Storage bucket that contains the files to sync.
  4. Select how to associate metadata values with files:
    • Apply common metadata to all files in the bucket.
    • Define metadata for several files in one JSON file.
    • Define metadata in a separate JSON file for each data file.
    • Add metadata by using Object Storage metadata properties.
  5. Upload the metadata files to the correct location in the Object Storage bucket.
  6. Perform data sync for the vector store.

Metadata Schema Example

Create a metadata schema file named _metadata_schema.json and save it at the root level of the Object Storage bucket.

{
  "metadataSchema": [
    {
      "name": "publication_year",
      "type": "integer"
    },
    {
      "name": "title",
      "type": "string"
    },
    {
      "name": "topic",
      "type": "list_of_string"
    },
    {
      "name": "rating",
      "type": "double"
    }
  ]
}

The metadata names that you use in metadata files must match the names defined in the schema.

Metadata Methods for Object Storage Files

The following table describes the supported methods for adding metadata to files in Object Storage, including where to create each metadata file or header and when to use each method.

Method File name and location When to use
Define the metadata schema Create _metadata_schema.json at the root level of the Object Storage bucket. Required for all Object Storage metadata file methods. The schema defines the supported metadata keys and value types.
Apply common metadata to all files in a bucket Create _common.metadata.json at the root level of the Object Storage bucket. Use when the same metadata applies to all files in the bucket. This method avoids duplicating metadata across files.
Define metadata for several files in one JSON file Create _all.metadata.json at the root level of the Object Storage bucket. Use when you have many files and prefer to manage metadata for all files in one JSON file instead of creating one metadata file per file.
Define metadata for one file Create <file-name>.metadata.json at the same level as the corresponding data file. The <file-name>value must match the name of the data file. Use when metadata differs by file and you have a small number of files, or when you automate metadata file creation.
Add metadata as Object Storage headers Add metadata by using each file’s Object Storage metadata properties. Use only when you have a small number of metadata properties. JSON metadata files are recommended because they’re easier to update and manage.

Metadata File Location Example

The following example shows where to save metadata files in an Object Storage bucket.

bucket_root/
  _metadata_schema.json
  _common.metadata.json
  _all.metadata.json
  file_0.pdf
  file_0.pdf.metadata.json
  folder_1/
    file_1.pdf
    file_1.pdf.metadata.json
  folder_2/
    file_2.pdf
    file_2.pdf.metadata.json

For file-specific metadata, the metadata file must be saved at the same level as the corresponding data file.

For example, if the data file is saved as:

folder_1/file_1.pdf

the metadata file must be saved as:

folder_1/file_1.pdf.metadata.json

Metadata JSON File Examples

Common Metadata for All Files

Create _common.metadata.json at the root level of the bucket to apply the same metadata to all files in the bucket.

Example:

{
  "metadataAttributes": {
    "publication_year": 2020,
    "topic": [
      "cooking",
      "health",
      "gardening"
    ],
    "rating": 3.3
  }
}

Metadata for Several Files

Create _all.metadata.json at the root level of the bucket to define metadata for several files in one JSON file.

Example:

{
  "folder_1/file_1.pdf": {
    "metadataAttributes": {
      "publication_year": 2020,
      "title": "Healthy Cooking Guide",
      "topic": [
        "cooking",
        "health"
      ],
      "rating": 4.5
    }
  },
  "folder_2/file_2.pdf": {
    "metadataAttributes": {
      "publication_year": 2022,
      "title": "Gardening Basics",
      "topic": [
        "gardening"
      ],
      "rating": 4.0
    }
  }
}

Metadata for One File

Create <file-name>.metadata.json at the same level as the corresponding data file.

For example, to define metadata for file_1.pdf, create a file named file_1.pdf.metadata.json.

Example:

{
  "metadataAttributes": {
    "publication_year": 2020,
    "title": "Healthy Cooking Guide",
    "topic": [
      "cooking",
      "health"
    ],
    "rating": 4.5
  }
}

Metadata Limits

The following limits apply to metadata used for search filtering.

Description Limit
Maximum number of entries in _all.metadata.json 10,000
Maximum number of metadata fields that can be specified for each file 20
Maximum number of items in a list_of_string type 10
Maximum length of each item in a list_of_string type 50 characters
Maximum length of a metadata key 25 characters
Maximum length of a metadata value 50 characters

Using Metadata with Data Sync

Add the metadata schema and metadata files before you sync data to the vector store.

After the files are synced, the metadata is available for search filtering. If you add or change metadata after syncing, perform data sync again so that the updated metadata is included in the vector store.

To sync files from Object Storage, see Sync Data to a Vector Store.