Cohere Embed 4
Cohere Embed 4 (cohere.embed-v4.0) is a multimodal embedding model that generates embeddings from text and image input. Image input is available through the API only.
Regions for this Model
For supported regions, endpoint types (on-demand or dedicated AI clusters), and hosting (OCI Generative AI or external calls) for this model, see the Models by Region page. For details about the regions, see the Generative AI Regions page.
Access this Model
The API inks list the endpoints for all supported commercial, sovereign, and government regions.
Key Features
- Input limits:
- Console: Up to 96 text inputs per run, with each text input under 512 tokens
- SDK and API: Up to 128,000 total input tokens per run
- Input mode:
- Input text or image, but not both
- For image embeddings, only one image is allowed
- Image input is available through the API only
- Image input (Available through API only)
- Requires a base64-encoded image
- A 512 x 512 image is about 1,610 tokens
- Language support:
- Text: English and multilingual
- Image: English only
On-Demand Mode
On-demand mode is pay-as-you-go and is useful for experimentation, proof-of-concept work, and model evaluation. On the pricing page, this model is listed as:
| Model Name | OCI Model Name | Pricing Page Product Name |
|---|---|---|
| Cohere Cohere Embed 4 | cohere.embed-v4.0 |
Embed Cohere |
Dynamic Throttling Limit Change for On-Demand Mode
OCI Generative AI dynamically adjusts the request throttling limit for each active tenancy based on model demand and system capacity to optimize resource allocation and ensure fair access. Because of dynamic throttling, rate limits are undocumented and can change to meet system-wide demand.
Because of the dynamic throttling limit change, we recommend implementing a back-off strategy, which involves delaying requests after a rejection. Without one, repeated rapid requests can lead to further rejections over time, increased latency, and potential temporary blocking of client by the Generative AI service. By using a back-off strategy, such as an exponential back-off strategy, you can distribute requests more evenly, reduce load, and improve retry success, following industry best practices and enhancing the overall stability and performance of the integration to the service.
Dedicated AI Cluster for the Model
To use this model with a dedicated AI cluster, create an endpoint for the model in a supported region.
| Base Model | Fine-Tuning Cluster | Hosting Cluster | Pricing Page Information | Request Cluster Limit Increase |
|---|---|---|---|---|
|
Not available for fine-tuning |
|
|
|
If you don't have enough hosting capacity, request an increase for the dedicated-unit-embed-cohere-count limit.
Endpoint Rules for Clusters
- A dedicated AI cluster can hold up to 50 endpoints.
- Use these endpoints to create aliases that all point either to the same base model or to the same version of a custom model, but not both types.
- Several endpoints for the same model make it easy to assign them to different users or purposes.
| Hosting Cluster Unit Size | Endpoint Rules |
|---|---|
| Embed Cohere |
|
-
To increase the call volume supported by a hosting cluster, increase its instance count by editing the dedicated AI cluster. See Updating a Dedicated AI Cluster.
-
For more than 50 endpoints per cluster, request an increase for the limit,
endpoint-per-dedicated-unit-count. See Creating a Limit Increase Request and Service Limits for Generative AI.
Cluster Performance Benchmarks
Review the Cohere Embed 4 cluster performance benchmarks for different use cases.
OCI Release and Retirement Dates
For release and retirement dates and replacement model options, see the following pages based on the mode (on-demand or dedicated):
Input Data for Text Embeddings
Input data for creating text embeddings has the following requirements:
- You can add sentences, phrases, or paragraphs for embeddings either one phrase at a time, or by uploading a file.
- Only files with a
.txtextension are allowed. - If you use an input file, each input sentence, phrase, or paragraph in the file must be separated with a newline character.
- A maximum of 96 inputs are allowed for each run.
- In the Console, each input must be less than 512 tokens for the text only models.
- If an input is too long, select whether to cut off the start or the end of the text to fit within the token limit by setting the Truncate parameter to Start or End. If an input exceeds the 512 token limit and the Truncate parameter is set to None, you get an error message.
- For the text and image models, you can have files and inputs that all add up to 128,000 tokens.
- For the text and image embed models, such as Cohere Embed English Image V3 you can either add text or add one image only. For the image, you can use API. Image input isn't available in the Console. For API, input a base64 encoded image in each run. For example, a 512 x 512 image is converted to about 1,610 tokens.
Learn about Creating text embeddings in OCI Generative AI.
Migrating from Embed 3 to Embed 4
If you're migrating from Embed 3 to Embed 4 and changing the vector size from 1,024 to 1,536 dimensions, use a new index to help avoid downtime.
-
Create a new vector index
Create a new index or collection in your vector database that is configured for 1,536 dimensions.
-
Re-embed the data
Reprocess your source documents with
cohere.embed-v4.0and setoutput_dimension=1536. Store the new embeddings in the new index. -
Update query logic
Update the application to use Embed 4 for incoming search queries.
Use:input_type="search_query"for queriesinput_type="search_document"for stored documents
-
Cut over
After the new index is fully populated and tested, update your application to use the new 1,536-dimension index.