OpenAI has recently released GPT-4V, a new language model that supports image uploads. However, this release has created a new attack vector, making large language models (LLMs) susceptible to multimodal prompt injection image attacks. These attacks involve embedding commands, malicious scripts, and code within images, which the model will then comply with.
Multimodal prompt injection image attacks have the potential to exfiltrate data, redirect queries, spread misinformation, and execute complex scripts to manipulate how LLMs interpret data. This means that attackers can bypass the safety measures in place and issue commands that can compromise an organization, leading to fraudulent activities and operational sabotage.
The Greatest Risk for Businesses
While all businesses using LLMs are at risk, those heavily relying on LLMs to analyze and classify images are at the greatest exposure. Attackers using various techniques can manipulate how these images are interpreted and classified, resulting in chaos and misinformation.
Once an LLM’s prompt is overridden, it becomes even more vulnerable to malicious commands and execution scripts. Attackers can take advantage of this vulnerability by embedding commands in a series of images uploaded to an LLM, potentially launching fraud and operational sabotage while contributing to social engineering attacks.
Furthermore, LLMs lack a data sanitization step in their processing, making them inherently trusting of every image they receive. Just as it is dangerous to allow unrestricted access to identities on a network, the same holds true for images uploaded into LLMs. Enterprises with private LLMs must adopt a least privilege access approach as a core cybersecurity strategy.
GPT-4V as a Primary Vector for Prompt Injection Attacks
“(LLMs’) only source of information is their training data combined with the information you feed them,” writes Simon Willison, describing why GPT-4V is a primary vector for prompt injection attacks. “If you feed them a prompt that includes malicious instructions — however those instructions are presented — they will follow those instructions.”
– Simon Willison
Simon Willison also demonstrated how prompt injection can hijack autonomous AI agents like Auto-GPT. He explained the process of a visual prompt injection that starts with commands embedded in a single image, followed by an example of a visual prompt injection exfiltration attack.
According to Paul Ekwere, senior manager for data analytics and AI at BDO UK, “prompt injection attacks pose a serious threat to the security and reliability of LLMs, especially vision-based models that process images or videos. These models are widely used in various domains, such as face recognition, autonomous driving, medical diagnosis, and surveillance.”
As of now, OpenAI does not have a solution to counter multimodal prompt injection image attacks, leaving the responsibility on users and enterprises to protect themselves. However, an Nvidia Developer blog post provides guidance, including enforcing least privilege access and implementing security measures for data stores and systems.
Exploiting Gaps in GPT-4V’s Visual Processing
Multimodal prompt injection attacks take advantage of gaps in GPT-4V’s visual processing to execute malicious commands without detection. GPT-4V relies on a vision transformer encoder to convert images into a latent space representation. The model combines the image and text data to generate a response.
However, GPT-4V lacks a method to sanitize visual input before encoding it. This presents an opportunity for attackers to embed an unlimited number of commands that the model would consider legitimate. As a result, attackers can automate multimodal prompt injection attacks against private LLMs without raising any suspicion.
One concerning aspect of using images as an attack vector is that it can gradually erode the credibility and fidelity of the data that LLMs are trained on over time.
Protecting LLMs Against Prompt Injection Attacks
A recent study offers guidelines to help LLMs better defend against prompt injection attacks. The researchers aimed to assess the risks and identify potential solutions for penetration into LLM-integrated applications.
The study revealed that 31 LLM-integrated applications were vulnerable to injection attacks. To contain these attacks, the study recommended implementing identity-access management (IAM) and least privilege access for enterprises relying on private LLMs.
LLM providers also need to consider how image data can be sanitized before processing, reducing the risk of user input impacting LLM code and data. Image prompts should undergo processing to ensure they do not interfere with internal logic or workflows.
Creating a multi-stage process to detect and prevent image-based attacks at an early stage can help manage this threat vector. Additionally, appending prompts to image inputs that appear malicious can provide an extra layer of protection to LLMs. However, it is important to note that advanced attacks may still find ways to bypass these measures.
With LLMs increasingly adopting multimodal capabilities, images are emerging as a new and potent threat vector that attackers can exploit to bypass and redefine security measures. Such image-based attacks can range from simple commands to sophisticated scenarios involving industrial sabotage and widespread misinformation.