In a world where artificial intelligence (AI) is rapidly advancing, the need for robust security measures cannot be overstated. Recent findings by Lasso Security have shed light on the vulnerability of generative AI models and their platforms, exemplified by a potentially devastating attack that was averted thanks to their diligent efforts. Lasso researchers discovered that a staggering 1,681 API tokens were at risk of being compromised, exposing numerous organizations to potential breach.
By scouring GitHub and Hugging Face repositories, Lasso researchers gained valuable insights and accessed 723 accounts from prominent organizations such as Meta, Hugging Face, Microsoft, Google, VMware, and more. Of these accounts, 655 were found to possess write permissions, and alarming 77 accounts had full control over the repositories of major companies. This breach extended its reach even further, as the researchers gained full access to Bloom, Llama 2, and Pythia repositories, putting millions of users in jeopardy of supply chain attacks.
“Notably, our investigation led to the revelation of a significant breach in the supply chain infrastructure, exposing high-profile accounts of Meta,” stated Lasso’s researchers in response to VentureBeat’s questions.
“The gravity of the situation cannot be overstated. With control over an organization boasting millions of downloads, we now possess the capability to manipulate existing models, potentially turning them into malicious entities. This implies a dire threat, as the injection of corrupted models could affect millions of users who rely on these foundational models for their applications,” continued the Lasso research team.
Hugging Face, a platform that has become indispensable to organizations developing large language models (LLMs), currently serves over 50,000 organizations in their devops efforts. Its popularity stems from being an open-source platform that facilitates collaboration and knowledge sharing among devops teams, accelerating LLM model development and enhance the likelihood of models making it into production.
However, this very popularity makes Hugging Face an attractive target for attackers aiming to exploit vulnerabilities in LLMs and generative AI supply chains. The potential risks include the poisoning of training data, the exfiltration of models and model training data, and even model theft. The consequences of such breaches are far-reaching and costly.
Recognizing the need for deeper insight into Hugging Face’s security measures, Lasso’s researchers delved into its registry and API token security. Their investigation unearthed three emerging risks for LLMs, namely supply chain vulnerabilities, training data poisoning, and the threat of model theft. These risks highlight the need for stringent security measures to protect LLMs and their associated data.
Lasso researchers discovered the ease with which LLM application lifecycles can be compromised by vulnerable components or services. Additionally, the use of third-party datasets, pre-trained models, and plugins further exacerbates these vulnerabilities. The compromise of API tokens can lead to the poisoning of training data, propelling potential vulnerabilities and biases that jeopardize LLM and model security.
One alarming finding was the revelation that compromised API tokens could be swiftly utilized to attain unauthorized access, copy, or exfiltrate proprietary LLM models. For startups relying on AWS-hosted platforms, the cost of training models on AWS ECS instances can amount to thousands of dollars per month.
“We had the opportunity to ‘steal’ over ten thousand private models associated with over 2500 datasets,” said Lasso’s research team. They argued that the OWASP Top 10 for LLMs should rename “Model Theft” to “AI Resource Theft (Models & Datasets)” due to the gravity of the situation.
The breach narrowly avoided by Hugging Face highlights the delicate balance between innovation and security in the realm of LLM and generative AI development platforms. Bar Lanyado, a security researcher at Lasso Security, recommends constant scanning for exposed API tokens and revocation or notification to users and organizations. Hard-coded tokens should be avoided, and best practices should be followed to safeguard repositories from inadvertent exposure of sensitive data.
To effectively manage API tokens, Hugging Face must adopt strategies such as unique authentication during token creation, multi-factor authentication, ongoing authentication for least privilege access, and rigorous lifecycle management. A zero-trust approach to API tokens is vital for Hugging Face to mitigate risks and maintain the security of their platform.
As Lasso Security’s research team warns, greater vigilance alone is insufficient when securing thousands of API tokens. Organizations must adopt posture management techniques and continually strengthen their security postures, particularly at the API token level. The research underscores the essential need for comprehensive verification and security solutions designed to protect transformative models at every stage of development.