Cohere Embed 4

Cohere Embed 4 (cohere.embed-v4.0) is a multimodal embedding model that generates embeddings from text and image input. Image input is available through the API only.

Regions for this Model

Important

For supported regions, endpoint types (on-demand or dedicated AI clusters), and hosting (OCI Generative AI or external calls) for this model, see the Models by Region page. For details about the regions, see the Generative AI Regions page.

Access this Model

You can access this model through:

Note

The API inks list the endpoints for all supported commercial, sovereign, and government regions.

Key Features

Input limits:
- Console: Up to 96 text inputs per run, with each text input under 512 tokens
- SDK and API: Up to 128,000 total input tokens per run
Input mode:
- Input text or image, but not both
- For image embeddings, only one image is allowed
- Image input is available through the API only
Image input (Available through API only)
- Requires a base64-encoded image
- A 512 x 512 image is about 1,610 tokens
Language support:
- Text: English and multilingual
- Image: English only

On-Demand Mode

On-demand mode is pay-as-you-go and is useful for experimentation, proof-of-concept work, and model evaluation. On the pricing page, this model is listed as:


Model Name	OCI Model Name	Pricing Page Product Name
Cohere Cohere Embed 4	`cohere.embed-v4.0`	Embed Cohere

Important

Dynamic Throttling Limit Change for On-Demand Mode

OCI Generative AI dynamically adjusts the request throttling limit for each active tenancy based on model demand and system capacity to optimize resource allocation and ensure fair access. Because of dynamic throttling, rate limits are undocumented and can change to meet system-wide demand.

Tip

Because of the dynamic throttling limit change, we recommend implementing a back-off strategy, which involves delaying requests after a rejection. Without one, repeated rapid requests can lead to further rejections over time, increased latency, and potential temporary blocking of client by the Generative AI service. By using a back-off strategy, such as an exponential back-off strategy, you can distribute requests more evenly, reduce load, and improve retry success, following industry best practices and enhancing the overall stability and performance of the integration to the service.

Dedicated AI Cluster for the Model

To use this model with a dedicated AI cluster, create an endpoint for the model in a supported region.


Base Model	Fine-Tuning Cluster	Hosting Cluster	Pricing Page Information	Request Cluster Limit Increase
Model Name: Cohere Embed 4 OCI Model Name: `cohere.embed-v4.0`	Not available for fine-tuning	Unit Size: Embed Cohere Required Units: 1	Pricing Page Product Name: Embed Cohere - Dedicated For Hosting, Multiply the Unit Price: x1	Limit Name: `dedicated-unit-embed-cohere-count` For Hosting, Request Limit Increase by: 1

Tip

If you don't have enough hosting capacity, request an increase for the dedicated-unit-embed-cohere-count limit.

Endpoint Rules for Clusters

A dedicated AI cluster can hold up to 50 endpoints.
Use these endpoints to create aliases that all point either to the same base model or to the same version of a custom model, but not both types.
Several endpoints for the same model make it easy to assign them to different users or purposes.


Hosting Cluster Unit Size	Endpoint Rules
Embed Cohere	Base model: To run the `cohere.embed-v4.0` model on several endpoints, create as many endpoints as you need on a Embed Cohere cluster (unit‑size). Custom model: You can't fine‑tune `cohere.embed-v4.0`, so you can't create and host custom models built from that base.

Tip

To increase the call volume supported by a hosting cluster, increase its instance count by editing the dedicated AI cluster. See Updating a Dedicated AI Cluster.
For more than 50 endpoints per cluster, request an increase for the limit, endpoint-per-dedicated-unit-count. See Creating a Limit Increase Request and Service Limits for Generative AI.

Cluster Performance Benchmarks

Review the Cohere Embed 4 cluster performance benchmarks for different use cases.

OCI Release and Retirement Dates

For release and retirement dates and replacement model options, see the following pages based on the mode (on-demand or dedicated):

Input Data for Text Embeddings

Input data for creating text embeddings has the following requirements:

You can add sentences, phrases, or paragraphs for embeddings either one phrase at a time, or by uploading a file.
Only files with a .txt extension are allowed.
If you use an input file, each input sentence, phrase, or paragraph in the file must be separated with a newline character.
A maximum of 96 inputs are allowed for each run.
In the Console, each input must be less than 512 tokens for the text only models.
If an input is too long, select whether to cut off the start or the end of the text to fit within the token limit by setting the Truncate parameter to Start or End. If an input exceeds the 512 token limit and the Truncate parameter is set to None, you get an error message.
For the text and image models, you can have files and inputs that all add up to 128,000 tokens.
For the text and image embed models, such as Cohere Embed English Image V3 you can either add text or add one image only. For the image, you can use API. Image input isn't available in the Console. For API, input a base64 encoded image in each run. For example, a 512 x 512 image is converted to about 1,610 tokens.

Learn about Creating text embeddings in OCI Generative AI.

Migrating from Embed 3 to Embed 4

If you're migrating from Embed 3 to Embed 4 and changing the vector size from 1,024 to 1,536 dimensions, use a new index to help avoid downtime.

Create a new vector index

Create a new index or collection in your vector database that is configured for 1,536 dimensions.
Re-embed the data

Reprocess your source documents with cohere.embed-v4.0 and set output_dimension=1536. Store the new embeddings in the new index.
Update query logic

Update the application to use Embed 4 for incoming search queries.
Use:
- input_type="search_query" for queries
- input_type="search_document" for stored documents
Cut over

After the new index is fully populated and tested, update your application to use the new 1,536-dimension index.

Oracle Cloud Infrastructure Documentation

Cohere Embed 4

Regions for this Model

Access this Model

Key Features

On-Demand Mode

Dedicated AI Cluster for the Model

Endpoint Rules for Clusters

Cluster Performance Benchmarks

OCI Release and Retirement Dates

Input Data for Text Embeddings

Migrating from Embed 3 to Embed 4