Toronto-based AI startup Cohere has recently released Embed V3, the latest version of its embedding model. This new version is specifically designed for semantic search and applications that leverage large language models (LLMs).
Embedding models, which transform data into numerical representations known as “embeddings,” have gained significant attention due to the rise of LLMs and their potential use cases in enterprise applications.
Embed V3 competes with OpenAI’s Ada and various open-source alternatives, offering superior performance and enhanced data compression. The goal of this advancement is to minimize the operational costs associated with enterprise LLM applications.
The Importance of Embeddings in Large Language Models
Embeddings play a pivotal role in various tasks, including retrieval augmented generation (RAG), which is a key application of large language models in the enterprise sector.
RAG enables developers to provide context to LLMs at runtime by retrieving information from external sources like user manuals, email and chat histories, articles, and other relevant documents that were not part of the model’s original training data.
In order to perform RAG, companies need to create embeddings of their documents and store them in a vector database. Each time a user queries the model, the AI system calculates the prompt’s embedding and compares it to the embeddings stored in the vector database. It then retrieves the documents that are most similar to the prompt and incorporates the content of these documents into the user’s prompt language, thus providing the LLM with the necessary context.
RAG can help address challenges faced by LLMs, such as limited access to up-to-date information and the generation of false information (referred to as “hallucinations”). However, like other search systems, one of the significant challenges of RAG is finding the most relevant documents for a user’s query.
Embed V3: Superior Performance in Document-to-Query Matching
Previous embedding models have encountered difficulties when dealing with noisy data sets, where some documents may have been incorrectly crawled or lack useful information.
Cohere’s Embed V3, on the other hand, demonstrates superior performance in matching documents to queries by providing more accurate semantic information about the document’s content. For example, in the case of a query about “COVID-19 symptoms,” Embed V3 would prioritize a document that discusses specific symptoms like “high temperature,” “continuous cough,” or “loss of smell or taste,” over a document that merely states COVID-19 has many symptoms.
Cohere claims that Embed V3 outperforms other models, including OpenAI’s ada-002, in standard benchmarks used to evaluate the performance of embedding models.
Embed V3 is available in different embedding sizes and includes a multilingual version that can match queries to documents across languages. This means it can locate French documents that match an English query, for example.
The versatility of Embed V3 extends to various applications such as search, classification, and clustering. It has demonstrated superior performance on advanced use cases, including multi-hop RAG queries. In such cases, the model is required to identify separate queries within a user’s prompt and retrieve relevant documents for each query. Embed V3 achieves higher-quality results within its top-10 retrieved documents, reducing the need for multiple queries to the vector database.
Furthermore, Embed V3 introduces an improvement called “reranking,” which is a feature added to Cohere’s API. Reranking allows search applications to sort existing search results based on semantic similarities. The spokesperson for Cohere stated, “Rerank is especially strong for queries and documents that address multiple aspects, something embedding models struggle with due to their design. However, Rerank requires that an initial set of documents is passed as input. It is critical that the most relevant documents are part of this top list. A better embedding model like Embed V3 ensures that no relevant documents are missed in this shortlist.”
In addition to its performance benefits, Embed V3 can also help reduce the costs associated with running vector databases. Cohere implemented a three-stage training process for the model, including a compression-aware training method. The spokesperson stated that the cost for the vector database can be significantly higher than computing the embeddings. Embed V3’s compression-aware training makes it suitable for vector compression methods, resulting in considerable cost reductions, potentially by several factors, while maintaining high search quality.
Overall, Cohere’s Embed V3 offers superior performance, enhanced data compression, and cost reduction benefits for enterprise applications utilizing large language models.