The advent of ChatGPT in late 2022 sparked a fierce competition among AI companies and tech giants, all aiming to dominate the growing market for large language model (LLM) applications. This led to the majority of firms providing their language models as proprietary services, offering API access without disclosing the model weights or specifics of their training datasets and methodologies. However, despite this trend towards private models, the open-source LLM ecosystem experienced significant growth and solidified its role as a prominent player in the LLM enterprise landscape in 2023.
Shift in Paradigm: From Size to Performance
Prior to 2023, it was widely believed that improving the performance of LLMs required increasing the model size. Open-source models like BLOOM and OPT, comparable to OpenAI’s GPT-3 with its 175 billion parameters, exemplified this approach. Although these large models were publicly accessible, they required substantial computational resources and specialized knowledge to run effectively.
However, in February 2023, Meta introduced Llama, a family of models ranging from 7 to 65 billion parameters. Llama demonstrated that smaller language models could achieve comparable performance to larger LLMs. The key to Llama’s success was training on a significantly larger corpus of data. While GPT-3 had been trained on approximately 300 billion tokens, Llama models ingested up to 1.4 trillion tokens. This approach of training more compact models on an expanded token dataset proved to be a game-changer, challenging the notion that size was the sole driver of LLM efficacy.
“Since the release of the original Llama by Meta, open-source LLMs have seen accelerated progress, and the latest open-source LLM, Mixtral, ranks as the third most helpful LLM in human evaluations behind GPT-4 and Claude,” said Jeff Boudier, head of product and growth at Hugging Face.
The Emergence of Derivative Models
Llama’s appeal extended beyond its performance. Its ability to operate on a single or a handful of GPUs and its open-source release enabled the research community to rapidly build upon its findings and architecture. This gave rise to a series of open-source LLMs that contributed novel facets to the open-source ecosystem. Notable among these were Cerebras-GPT by Cerebras, Pythia by EleutherAI, MosaicML’s MPT, X-GEN by Salesforce, and Falcon by TIIUAE.
In July, Meta released Llama 2, which became the basis for numerous derivative models. Mistral.AI made a significant impact with the release of Mistral and Mixtral, the latter being particularly acclaimed for its capabilities and cost-effectiveness. These foundation models, including Alpaca, Vicuna, Dolly, and Koala, were further fine-tuned for specific downstream applications.
- According to data from Hugging Face, developers have already created thousands of forks and specialized versions of these models, reflecting the vibrant open-source community.
- Microsoft, the primary backer of OpenAI, has not only released two open-source models, Orca and Phi-2, but has also integrated open-source models into its Azure AI Studio platform.
- Amazon, a major investor of Anthropic, introduced Bedrock, a cloud service designed to host proprietary and open-source models.
“As AI is the new way of building technology, AI just like other technologies before it will need to be created and managed in-house, with all the privacy, security, and compliance that customer information and regulation requires. And if the past is any indication, that means with open source,” Boudier emphasized.