Despite the recent power struggle and mass resignations at OpenAI, Microsoft is continuing to make progress in the field of AI. The research arm of the company has recently unveiled Orca 2, a pair of small language models that have surpassed larger models in terms of performance on complex reasoning tasks.
Enhanced Reasoning Abilities
The Orca 2 models, available in sizes with 7 billion and 13 billion parameters, have demonstrated strong reasoning abilities in zero-shot settings. These smaller models have either matched or outperformed language models that are five to ten times larger, including Meta’s Llama-2 Chat-70B.
This achievement builds on the previous work done with the 13B Orca model, which imitated step-by-step reasoning traces of larger models. The improved training signals and methods used with Orca 2 have empowered these smaller models to achieve enhanced reasoning abilities, typically found only in much larger language models.
“With Orca 2, we continue to show that improved training signals and methods can empower smaller language models to achieve enhanced reasoning abilities, which are typically found only in much larger language models,” Microsoft researchers wrote in a joint blog post.
To further the development and evaluation of smaller models, Microsoft has open-sourced both the 7 billion and 13 billion parameter models. This allows for broader research and gives enterprises, particularly those with limited resources, a better option for addressing their targeted use cases without requiring significant computing power.
Addressing the Gap
While larger language models have impressed with their ability to reason and answer complex questions, their smaller counterparts have struggled in this area. Microsoft Research aimed to bridge this gap by fine-tuning Llama 2 base models on a highly-tailored synthetic dataset.
Instead of using imitation learning, where smaller models replicate the behavior of larger ones, the researchers trained the Orca 2 models to employ different solution strategies for different tasks. This approach acknowledges that a larger model’s strategy may not always work perfectly for a smaller model.
“In Orca 2, we teach the model various reasoning techniques (step-by-step, recall then generate, recall-reason-generate, direct answer, etc.). More crucially, we aim to help the model learn to determine the most effective solution strategy for each task,” the researchers wrote in a published paper.
The training data for the Orca 2 models was obtained from a more capable teacher model. This data taught the smaller models how to use a reasoning strategy and when to use it for a given task. When tested on 15 diverse benchmarks in zero-shot settings, the Orca 2 models produced impressive results, outperforming larger models in most cases.
Future Advancements and Implications
While the Orca 2 models have shown promise in terms of enhanced reasoning abilities and performance, they still inherit some limitations common to language models in general. Microsoft believes that the technique used to create these models can be applied to other base models as well.
“While it has several limitations, Orca 2’s potential for future advancements is evident, especially in improved reasoning, specialization, control, and safety of smaller models. The use of carefully filtered synthetic data for post-training emerges as a key strategy in these improvements. As larger models continue to excel, our work with Orca 2 marks a significant step in diversifying the applications and deployment options of language models,” the research team stated.
With the release of the open-source Orca 2 models and ongoing research in the field, it is clear that more high-performing small language models are likely to emerge in the near future. Other companies, such as China’s 01.AI and Paris-based Mistral AI, have also made strides in this area, indicating a growing trend toward smaller, high-performing language models.