The Challenge of AI Hallucinations and Vectara’s Solution

The issue of AI hallucinations presents a significant challenge in the adoption of enterprise AI. Organizations strive to avoid generating inaccurate results from generative AI efforts. Vectara, a company that emerged from stealth in October 2022, is among the organizations dedicated to solving the problem of AI hallucination. Led by one of the co-founders of Big Data vendor Cloudera, Vectara has been making strides in this area.

Vectara’s Generative AI Platform

In May, Vectara updated its Generative AI platform with a grounded search capability. This enhancement aims to provide retrieval augmented generation (RAG) results based on content. The platform has now taken another step forward in reducing the risk of AI hallucination with the introduction of its new Boomerang technology.

The Boomerang Technology

Boomerang, referred to as a neural information retrieval model, offers a novel approach to generating vector embeddings, which are crucial for large language models (LLMs). This technology enables a higher degree of accuracy with less hallucination. Amin Ahmad, co-founder and CTO of Vectara, explains, “It’s a retrieval mode, it’s fundamentally there to serve the following purpose, the user sends a query into some kind of knowledge base and relevant information comes back out of the knowledge base.” He likens it to a boomeranging action.

Vectara’s Boomerang engine enhances the accuracy of its GenAI platform and builds upon the company’s grounded generation approach. The grounded generation method involves placing data in a special vector database or meaning space. According to Amr Awadallah, co-founder and CEO of Vectara, if the data cannot be properly mapped within this meaning space, the system will fail to provide accurate information in response to user queries.

Boomerang, as a developed model by Vectara, generates vector embeddings that represent the meanings behind words, irrespective of language. Creating vector embeddings is a critical process done by all major LLM vendors. For example, OpenAI has its own ada embedding models, which have been continuously improved in recent years.

Awadallah explains that Boomerang is an upgraded engine compared to what Vectara had before, delivering a higher degree of quality and accuracy for vector embeddings. The core benefit of Boomerang for enterprises is its ability to produce better facts. Awadallah states, “Because now we have way better facts, everything else improves, the hallucination probability goes down and the explainability becomes way better on the output side.”

The precise method employed by Boomerang to create superior vector embeddings is complex. Ahmad claims that the new model was derived from the previous model through the application of numerous new techniques and an expansion of training data. Vectara aims to publish research papers to share the unique methods that enable the Boomerang vector embedding approach.

In addition, Vectara’s team has developed new techniques that will be documented in academic research. These advancements, resulting from extensive research and experimentation, have positioned Vectara as a leading company in the field.

Boomerang, according to Vectara, outperforms other larger models in cross-lingual retrieval. It demonstrates a superior ability to comprehend content in hundreds of languages and dialects. While the updated platform makes significant progress in reducing the risk of hallucination, Vectara acknowledges that there is more work to be done to achieve a hallucination rate of 0%. The company remains committed to further research to minimize hallucination, which is critical in business contexts.”

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Posts