Meta’s Llama 2 Long AI Model Outperforms Competition in Long Prompts

Meta’s Llama 2 Long AI Model Outperforms Competition in Long Prompts

Meta Platforms recently unveiled several new AI features for its popular consumer-facing services, Facebook, Instagram, and WhatsApp, at the Meta Connect conference. However, the most notable development from the company came quietly through a computer science paper published by Meta researchers on arXiv.org.

The paper introduces Llama 2 Long, an enhanced AI model derived from Meta’s open source Llama 2. This improved version of Llama 2 underwent continuous pretraining with longer training sequences and a dataset that includes upsampled long texts. According to the researchers, this modification resulted in Meta’s elongated AI model surpassing leading competitors, such as OpenAI’s GPT-3.5 Turbo and Claude 2, in generating responses to long user prompts.

Improving Performance Through Enriched Dataset and Model Architecture

The Meta researchers expanded the original Llama 2 training dataset by including an additional 400 billion tokens worth of longer text data sources. They also maintained the same architecture for Llama 2 Long but made a necessary adjustment to the positional encoding, specifically the Rotary Positional Embedding (RoPE) encoding. RoPE encoding maps token embeddings onto a 3D graph, facilitating accurate responses with less storage and computational resources.

By decreasing the rotation angle of the RoPE encoding, the researchers ensured that Llama 2 Long incorporated more distant tokens, thus enriching its knowledge base. They utilized reinforcement learning from human feedback (RLHF) and synthetic data generated by Llama 2 chat to further enhance the model’s performance in coding, math, language understanding, common sense reasoning, and question answering.

Implications for the Open-Source AI Community

The release of Llama 2 Long and its exceptional performance has garnered significant attention and enthusiasm within the open-source AI community on platforms like Reddit, Twitter, and Hacker News. It serves as validation of Meta’s commitment to an “open source” approach in generative AI and demonstrates that open-source models can compete with closed-source alternatives offered by well-funded startups.

Total
0
Shares
Leave a Reply

Your email address will not be published. Required fields are marked *

Related Posts