Mistral AI, a Paris-based startup known for its unique Word Art logo and record-breaking $118 million seed round, has just announced the release of its first large language AI model, Mistral 7B. With 7.3 billion parameters, this model outperforms larger offerings and is considered the most powerful language model for its size to date. Notably, it can handle English tasks while also delivering natural coding capabilities, making it a versatile option for various enterprise use cases.
Mistral AI, founded earlier this year by alums from Google’s DeepMind and Meta, aims to “make AI useful” for enterprises by solely utilizing publicly available data and customer-contributed data. The company has now taken a big step towards this goal by providing teams with Mistral 7B, a small-sized model capable of low-latency text summarization, classification, text completion, and code completion.
Outperforming the Competition
Mistral AI claims that Mistral 7B already surpasses its open-source competitors in performance. In benchmark tests covering a wide range of tasks, Mistral 7B consistently outperforms Llama 2 7B and 13B, two popular models. For example, in the Massive Multitask Language Understanding (MMLU) test, Mistral 7B achieved an accuracy of 60.1%, while Llama 2 7B and 13B only achieved slightly above 44% and 55%, respectively. Additionally, Mistral 7B demonstrated higher accuracy in tests gauging commonsense reasoning and reading comprehension compared to the Llama models.
Limited parameter count is the only area where Llama 2 13B matched Mistral 7B, affecting performance in the world knowledge test. Mistral attributes this difference to the model’s constraints on knowledge compression due to its parameter count. Despite this, Mistral 7B still outperformed Llama 2 13B on all metrics.
Potential Benefits for Businesses
Mistral’s demonstration of a small model delivering high performance across various tasks could bring significant benefits to businesses. Mistral 7B can deliver performance equivalent to a Llama 2 model more than three times its size (23 billion parameters) in the MMLU test. This not only saves memory but also provides cost benefits without compromising final outputs.
The company attributes faster inference speed to its use of a sliding window attention (SWA) mechanism called grouped-query attention (GQA). Mistral 7B also handles longer sequences at a smaller cost using SWA. In practice, these improvements result in a 2x speed enhancement for a sequence length of 16k with a window of 4k.
Mistral AI plans to continue its work by releasing a larger model capable of better reasoning and working in multiple languages. This model is expected to debut in 2024. In the meantime, Mistral 7B can be deployed anywhere, from local environments to major cloud platforms like AWS, GCP, or Azure, using the company’s reference implementation and vLLM inference server.