Cohere Embed Multilingual 3
Review performance benchmarks for the cohere.embed-multilingual-v3.0
(Cohere Embed Multilingual 3) model hosted on one Embed Cohere unit of a dedicated AI cluster in OCI
Generative AI.
Text Embeddings
This scenario applies only to the embedding models with text input. This scenario mimics embedding generation as part of the data ingestion pipeline of a vector database. In each scenario, all requests are the same size, which is 96 documents, each one with the same number of tokens. For example, for the scenario of 512 tokens mimics a collection of large PDF files, each file with 30,000+ words that a user would ingest into a vector DB.
64 Tokens
The following tables show the benchmarks for a scenario of 96 documents, 64 tokens per document.
- The
cohere.embed-multilingual-v3.0
model hosted on one Embed Cohere unit of a dedicated AI cluster for all regions except for the Saudi Arabia Central (Riyadh) region. -
Concurrency Time to First Token (TTFT)(second) Request-level Latency (second) Request-level Throughput (Request per second) (RPS) Total Throughput (tokens/second) 1 0.10 0.10 9.86 607.39 2 0.10 0.10 9.66 1,192.60 4 0.11 0.11 9.30 2,295.80 8 0.11 0.11 8.69 4,296.59 24 0.15 0.15 6.69 9,900.74 48 0.19 0.19 5.10 15,114.13 96 0.28 0.28 3.43 20,339.92 - The
cohere.embed-multilingual-v3.0
model hosted on one Embed Cohere unit of a dedicated AI cluster for the Saudi Arabia Central (Riyadh) region. -
Concurrency Time to First Token (TTFT)(second) Request-level Latency (second) Request-level Throughput (Request per second) (RPS) Total Throughput (tokens/second) 1 0.09 0.09 11.29 697.03 2 0.09 0.09 11.06 1,365.27 4 0.09 0.09 10.58 2,616.91 8 0.10 0.10 9.78 4,818.35 24 0.14 0.14 7.18 10,639.81 48 0.18 0.18 5.32 15,775.42 96 0.28 0.28 3.46 20,495.80
128 Tokens
The following tables show the benchmarks for a scenario of 96 documents, 128 tokens per document.
- The
cohere.embed-multilingual-v3.0
model hosted on one Embed Cohere unit of a dedicated AI cluster for all regions except for the Saudi Arabia Central (Riyadh) region. -
Concurrency Time to First Token (TTFT)(second) Request-level Latency (second) Request-level Throughput (Request per second) (RPS) Total Throughput (tokens/second) 1 0.10 0.10 9.78 1,243.08 2 0.10 0.10 9.55 2,428.37 4 0.11 0.11 9.16 4,660.31 8 0.12 0.12 8.42 8,561.42 24 0.15 0.15 6.31 19,255.04 48 0.21 0.21 4.62 28,172.80 96 0.31 0.31 3.01 36,706.87 - The
cohere.embed-multilingual-v3.0
model hosted on one Embed Cohere unit of a dedicated AI cluster for the Saudi Arabia Central (Riyadh) region. -
Concurrency Time to First Token (TTFT)(second) Request-level Latency (second) Request-level Throughput (Request per second) (RPS) Total Throughput (tokens/second) 1 0.09 0.09 11.28 1,435.60 2 0.09 0.09 10.92 2,773.25 4 0.09 0.09 10.36 5,284.49 8 0.10 0.10 9.41 9,575.47 24 0.14 0.14 6.78 20,686.00 48 0.20 0.20 4.82 29,426.82 96 0.35 0.35 2.64 32,277.12
256 Tokens
The following table shows the benchmarks for a scenario of 96 documents, 256 tokens per document.
- The
cohere.embed-multilingual-v3.0
model hosted on one Embed Cohere unit of a dedicated AI cluster for all regions except for the Saudi Arabia Central (Riyadh) region. -
Concurrency Time to First Token (TTFT)(second) Request-level Latency (second) Request-level Throughput (Request per second) (RPS) Total Throughput (tokens/second) 1 0.10 0.10 9.65 2,493.61 2 0.10 0.10 9.37 4,840.64 4 0.11 0.11 8.81 9,107.30 8 0.12 0.12 7.88 16,286.08 24 0.17 0.17 5.50 34,074.47 48 0.24 0.24 3.85 47,783.85 96 0.43 0.43 2.16 53,652.55
512 Tokens
The following tables show the benchmarks for a scenario of 96 documents, 512 tokens per document.
- The
cohere.embed-multilingual-v3.0
model hosted on one Embed Cohere unit of a dedicated AI cluster for all regions except for the Saudi Arabia Central (Riyadh) region. -
Concurrency Time to First Token (TTFT)(second) Request-level Latency (second) Request-level Throughput (Request per second) (RPS) Total Throughput (tokens/second) 1 0.10 0.10 9.21 4,791.85 2 0.11 0.11 8.74 9,094.38 4 0.12 0.12 7.76 16,165.29 8 0.15 0.15 6.39 26,582.37 24 0.23 0.23 4.03 50,284.92 48 0.37 0.37 2.44 61,014.65 96 0.63 0.63 1.42 71,048.17 - The
cohere.embed-multilingual-v3.0
model hosted on one Embed Cohere unit of a dedicated AI cluster for the Saudi Arabia Central (Riyadh) region. -
Concurrency Time to First Token (TTFT)(second) Request-level Latency (second) Request-level Throughput (Request per second) (RPS) Total Throughput (tokens/second) 1 0.09 0.09 10.34 5,379.92 2 0.10 0.10 9.79 10,190.27 4 0.11 0.11 8.67 18,059.39 8 0.13 0.13 7.00 29,158.44 24 0.22 0.22 4.20 52,493.14 48 0.36 0.36 2.36 58,908.72 96 0.62 0.62 1.44 71,746.46
1,024 Tokens
The following table shows the benchmarks for a scenario of 96 documents, 1,024 tokens per document.
- The
cohere.embed-multilingual-v3.0
model hosted on one Embed Cohere unit of a dedicated AI cluster for the Saudi Arabia Central (Riyadh) region. -
Concurrency Time to First Token (TTFT)(second) Request-level Latency (second) Request-level Throughput (Request per second) (RPS) Total Throughput (tokens/second) 1 0.09 0.09 8.96 9,348.26 2 0.10 0.10 8.25 17,233.64 4 0.12 0.12 6.87 28,670.18 8 0.15 0.15 5.37 44,877.47 24 0.28 0.28 2.86 71,733.22 48 0.53 0.53 1.52 76,050.39 96 0.99 0.99 0.80 80,597.64
2,048 Tokens
The following tables show the benchmarks for a scenario of 96 documents, 2,048 tokens per document.
- The
cohere.embed-multilingual-v3.0
model hosted on one Embed Cohere unit of a dedicated AI cluster for all regions except for the Saudi Arabia Central (Riyadh) region. -
Concurrency Time to First Token (TTFT)(second) Request-level Latency (second) Request-level Throughput (Request per second) (RPS) Total Throughput (tokens/second) 1 0.11 0.11 8.16 17,069.42 2 0.13 0.13 7.17 30,023.10 4 0.15 0.15 5.82 48,737.45 8 0.21 0.21 4.21 70,449.76 24 0.48 0.48 1.79 90,090.74 48 0.93 0.93 0.92 92,553.18 96 1.68 1.68 0.51 101,574.61 - The
cohere.embed-multilingual-v3.0
model hosted on one Embed Cohere unit of a dedicated AI cluster for the Saudi Arabia Central (Riyadh) region. -
Concurrency Time to First Token (TTFT)(second) Request-level Latency (second) Request-level Throughput (Request per second) (RPS) Total Throughput (tokens/second) 1 0.10 0.10 6.27 13,116.54 2 0.12 0.12 5.65 23,647.68 4 0.14 0.14 4.70 39,382.39 8 0.20 0.20 3.32 55,564.63 24 0.48 0.48 1.37 68,783.08 48 0.89 0.89 0.73 72,944.21 96 1.66 1.66 0.39 77,928.29
8,096 Tokens
The following tables show the benchmarks for a scenario of 96 documents, 8,096 tokens per document.
- The
cohere.embed-multilingual-v3.0
model hosted on one Embed Cohere unit of a dedicated AI cluster for all regions except for the Saudi Arabia Central (Riyadh) region. -
Concurrency Time to First Token (TTFT)(second) Request-level Latency (second) Request-level Throughput (Request per second) (RPS) Total Throughput (tokens/second) 1 0.15 0.15 5.40 44,724.35 2 0.20 0.20 4.05 67,118.88 4 0.31 0.31 2.59 85,752.57 8 0.59 0.59 1.41 93,369.42 24 1.56 1.56 0.53 105,492.68 48 2.84 2.84 0.29 115,098.50 96 5.27 5.27 0.15 121,706.97 - The
cohere.embed-multilingual-v3.0
model hosted on one Embed Cohere unit of a dedicated AI cluster for the Saudi Arabia Central (Riyadh) region. -
Concurrency Time to First Token (TTFT)(second) Request-level Latency (second) Request-level Throughput (Request per second) (RPS) Total Throughput (tokens/second) 1 0.14 0.14 1.32 10,962.57 2 0.19 0.19 1.06 17,550.18 4 0.29 0.29 0.76 25,090.59 8 0.56 0.56 0.38 25,436.88 24 1.52 1.52 0.15 30,238.80 48 2.79 2.79 0.11 43,355.58 96 5.21 5.21 0.05 42,052.47
32,000 Tokens
The following tables show the benchmarks for a scenario of 96 documents, 32,000 tokens per document.
- The
cohere.embed-multilingual-v3.0
model hosted on one Embed Cohere unit of a dedicated AI cluster for all regions except for the Saudi Arabia Central (Riyadh) region. -
Concurrency Time to First Token (TTFT)(second) Request-level Latency (second) Request-level Throughput (Request per second) (RPS) Total Throughput (tokens/second) 1 0.38 0.38 1.90 62,367.46 2 0.57 0.57 1.30 85,209.04 4 1.05 1.05 0.75 97,631.79 8 1.91 1.91 0.42 109,411.13 24 5.12 5.12 0.16 122,915.27 - The
cohere.embed-multilingual-v3.0
model hosted on one Embed Cohere unit of a dedicated AI cluster for the Saudi Arabia Central (Riyadh) region. -
Concurrency Time to First Token (TTFT)(second) Request-level Latency (second) Request-level Throughput (Request per second) (RPS) Total Throughput (tokens/second) 1 0.36 0.36 0.01 198.49 8 2.13 2.13 0.37 94,825.57 24 5.78 5.78 0.14 104,466.77
128,000 Tokens
The following tables show the benchmarks for a scenario of 96 documents, 128,000 tokens per document.
- The
cohere.embed-multilingual-v3.0
model hosted on one Embed Cohere unit of a dedicated AI cluster for all regions except for the Saudi Arabia Central (Riyadh) region. -
Concurrency Time to First Token (TTFT)(second) Request-level Latency (second) Request-level Throughput (Request per second) (RPS) Total Throughput (tokens/second) 1 1.00 1.00 0.99 121,894.93 2 2.26 2.26 0.44 134,957.95 4 3.36 3.36 0.29 146,216.07 - The
cohere.embed-multilingual-v3.0
model hosted on one Embed Cohere unit of a dedicated AI cluster for the Saudi Arabia Central (Riyadh) region. -
Concurrency Time to First Token (TTFT)(second) Request-level Latency (second) Request-level Throughput (Request per second) (RPS) Total Throughput (tokens/second) 1 1.10 1.10 0.90 109,801.35 2 2.01 2.01 0.49 120,154.57 4 3.74 3.74 0.26 129,080.96