OWASP Top 10 attacks on LLM: understanding the new cybersecurity challenges
With the rise of large language models (LLM) such as GPT-3, GPT-4 and others, artificial intelligence has made major advances, transforming many sectors and usages. However, the increasing integration of these models into everyday applications brings with it new challenges, particularly in terms of cybersecurity. LLM are increasingly the target of sophisticated attacks that exploit … Continued
With the rise of large language models (LLM) such as GPT-3, GPT-4 and others, artificial intelligence has made major advances, transforming many sectors and usages. However, the increasing integration of these models into everyday applications brings with it new challenges, particularly in terms of cybersecurity. LLM are increasingly the target of sophisticated attacks that exploit their inherent vulnerabilities. In this series of articles, we will explore in detail twelve main types of attack identified against LLM (including the OWASP Top 10). These attacks, ranging from subtle response manipulations to sensitive information leaks, represent major vulnerabilities for organizations that depend on or use these technologies.
Identifying and understanding the OWASP Top 10 (and more) attacks on LLM to better protect them
Guarding against malicious use of LLM
While incredibly powerful, LLM are far from infallible. In addition to being new, their complex architectures and ability to learn from large quantities of data expose these models to a variety of new threats, often misunderstood or underestimated. Among the most worrying OWASP Top 10 attacks on LLM are prompt injections, extraction of confidential data, and adversarial attacks aimed at altering model behavior to gain malicious advantage.
Mastering the OWASP Top 10 (and more) types of attacks on LLM
Each type of attack has its own mechanisms and consequences, and it is crucial for cybersecurity professionals to fully understand them in order to develop effective defense strategies. In the articles to come, we’ll dive into each of the OWASP Top 10 attacks to examine how they work, why they’re possible and how to mitigate them.
Among the typologies we’ll look at are :
- Prompt injection: Manipulation of instructions to falsify results.
- Disclosure of sensitive information: Extraction of confidential information stored in the model.
- Data poisoning: Corrupting data to alter model behavior.
- Supply chain: Disruption of the model via specially designed inputs.
…and many more.
These twelve types of attack include those in the OWASP Top 10 (with the nomenclature “LLM ##”) as well as 2 others which are important enough, according to use, to be added to this Top 10 and therefore create a “Top 12”.
Developing a comprehensive understanding of OWAS Top 10 attacks on LLM
To tackle the threat to LLM, I-TRACING has developed its own specialized audit framework, in order to audit and evaluate the security coverage of LLMs applications. Based on our expertise and proven experience in AI security, our aim is to provide you with a clear and accessible understanding of each attack, while offering practical recommendations for strengthening LLM security. Whether you’re a cybersecurity professional, a developer working with AI, or simply a technology enthusiast, this series will offer you essential insight into the challenges of dealing with these new threats.
Stay with us as we explore each attack typology in depth and discover how to protect your language models from the dangers lurking behind these powerful tools.
OWASP Top 10 (and more) cyberattacks on LLM
OWASP LLM01: Prompt injection
Prompt injection consists in manipulating model inputs to make it perform actions not intended by the developer. A distinction is made between direct and indirect prompt injection. This attack is the main vector for exploiting the various vulnerabilities present in LLM.
This can enable the system to be compromised by executing unauthorized commands, leaking and/or exfiltrating sensitive data, using the model for malicious actions without detection, or manipulating AI output to deceive users or compromise other systems. These different techniques are used to test many of the other attacks listed below.
Consequences of a prompt injection attack on LLM:
- Data exfiltration
- Execution of unauthorized code
- Bypassing security mechanisms
- Manipulation of model results
Remediation of a prompt injection attack on LLM:
- Implement strict access controls for backend functionalities
- Add human validation for critical operations
- Separate external content from user prompts
- Establish trust boundaries between LLM, external sources and extensible functionality
💡Good to know: the difference between direct and indirect prompt injection
Direct prompt injection
An attacker directly submits malicious instructions in the AI’s prompt, bypassing the safeguards in place. For example, by adding specific commands that modify the AI’s behavior or extract sensitive data. This technique is possible whenever there is an interface in the application where the user can enter a request transmitted to the LLM, or when the request sent by the application to the LLM can be intercepted and altered.
Indirect prompt injection
This attack is carried out via external sources such as files or websites. The attacker embeds malicious instructions in these sources, which are then processed by the AI. These instructions are often invisible to the human eye, but readable by the LLM (a white font on a white background, for example).
💡Good to know: the difference between prompt injection and jailbreak
Prompt injection and jailbreak attacks are often confused with each other or considered similar, but the difference between the two is notable.
Jailbreak is a class of attacks that aims to bypass security filters, and ethical or political limitations imposed on the model by its developers. These restrictions are often put in place to prevent LLM from generating content that is inappropriate, illegal or contrary to user policies (e.g. hateful, explicit or dangerous content).
Prompt injection exploits the concatenation or composition of prompts between trusted instructions (those defined by the developers) and untrustworthy inputs (usually supplied by the user). The LLM model is manipulated to interpret unexpected or unintended instructions by modifying or injecting text into the initial prompt.
The difference is that with prompt injection, the attacker manipulates the input stream, playing with the prompt itself to make the model perform actions that were not intended, often as part of the upstream application ; whereas with jailbreak, the attacker aims to bypass the restrictions imposed on the model directly, making it violate its ethical or security usage policies.
In the context of a security audit, jailbreak is generally not considered a vulnerability. It may, however, be considered as part of an audit that also concerns the model’s alignment, or if the application is intended for an audience for whom jailbreak would be harmful or dangerous (school, legal…).
OWASP LLM02: Insecure output processing
Failure to check and filter model-generated output before it is used or displayed can expose the system to incorrect or dangerous information. This negligence also makes the system vulnerable to attacks by inserting malicious scripts or commands into the AI responses, thus compromising the security and integrity of the processed data. In the case of interpretation of LLM output by a browser or server, this can sometimes result in an XSS or RCE vulnerability.
Consequences of insecure processing of LLM output:
- Exposure to incorrect or dangerous information.
- Vulnerability to attacks by inserting scripts or commands into AI responses
- Exposure to remote code execution
Remediation of insecure LLM output processing:
- Treat the model as an untrustworthy user and apply appropriate validation
- Follow OWASP guidelines for input validation and sanitization
- Encode model output to mitigate unwanted code execution
OWASP LLM03 : Data poisoning
Manipulation of LLM pre-training data or data involved in LLM fine-tuning or integration processes is intended to introduce vulnerabilities, backdoors or biases.
Consequences of an attack on LLM through data poisoning:
- Compromise of model security, efficiency or ethical behavior
- Generation of biased or malicious results
- Performance degradation
Remediation of an attack on LLM by data poisoning:
- Rigorously check the training data sources
- Implement anomaly detection and adverse robustness techniques
- Use secure MLOps pipelines to track data, models and usage
OWASP LLM04 : Denial of service
It is possible to exploit resources in the LLM AI system to overload it and cause a denial of service. If a third-party AI tool is used, this can also have an impact on the cost of that tool.
Consequences of a denial-of-service attack on LLM:
- Service unavailable for legitimate users.
- Potential financial and reputational losses
- Performance degradation
Remediation of a denial-of-service attack on LLM:
- Implement rate limits (number of requests per user/IP for example, …) and resource limits (processing reservation per request)
- Use anomaly detection techniques to detect and mitigate attacks
- Implement continuous monitoring of model performance
OWASP LLM05 : Supply chain vulnerabilities
Attackers can take advantage of vulnerable components or services to compromise the LLM application lifecycle.
Consequences of vulnerabilities in the LLM application supply chain:
- Introduction of vulnerabilities in the application
- Unauthorized access to systems
- Damages to the integrity of the model
Remediation to protect LLM applications from supply chain attack:
- Maintain an up-to-date inventory of components with a Software Bill of Materials (SBOM)
- Perform regular vulnerability scans
- Implement a patching policy for vulnerable or obsolete components
OWASP LLM06: Disclosure of sensitive information
An attacker can exploit AI features to extract sensitive data processed by the model and leak confidential or personal data. In this way, the attacker can retrieve data from the training dataset, RAGs (external knowledge sources) and user data.
Consequences of an attack on LLM through disclosure of sensitive information:
- Leakage of confidential or personal data.
- Damage to reputation and potential legal sanctions
- Loss of intellectual property
Remediation of an attack on LLM by disclosure of sensitive information:
- Implement appropriate data sanitization and cleansing techniques
- Apply strict access control to external data sources
- Limit model access to sensitive data
- Implement PD for confidential data
OWASP LLM07: Insecure plugin design
The presence of vulnerabilities in the design of LLM plugins, including insecure inputs and insufficient access control, make them easier to exploit.
Consequences of insecure LLM plugins:
- Remote code execution
- Unauthorized access to functionality
- Manipulation of model results
Remediation of insecure LLM plugins:
- Apply strict input validation and sanitization
- Design plugins to minimize the impact of exploiting insecure input parameters
- Use appropriate authentication and authorization for plugins
OWASP LLM08: Excessive permissiveness
When interconnecting the LLM with other application systems, granting excessive permissions and autonomy to the AI can enable it to perform potentially damaging actions.
Consequences of excessive permissiveness in an LLM:
- Execution of unwanted or dangerous actions.
- Risk of compromise of interconnected systems.
Remediation of excessive permissiveness in LLM:
- Limit the plugins/tools LLM agents are allowed to call
- Restrict functions implemented in LLM plugins/tools to the minimum necessary
- Use human control to approve critical actions
OWASP LLM09: Over-reliance on AI for decision-making
Over-reliance on AI for critical decisions without sufficient human supervision can lead to erroneous or dangerous decisions based on incorrect or inappropriate AI output. This reduces the capacity for human control and intervention in the event of failure, increasing risks to safety and operational efficiency. This is particularly the case when AI has to analyze documents that have not been checked beforehand.
Consequences of over-reliance on AI for decision-making:
- Wrong or dangerous decisions based on incorrect AI output.
- Reduced capacity for human control and intervention in the event of failure.
- Security vulnerabilities due to incorrect or inappropriate content generated by LLM
Remediation of over-reliance on AI for decision-making:
- Implement mechanisms for continuous validation of LLM output
- Cross-check LLM-generated information with reliable external sources
- Clearly communicate the risks and limitations associated with the use of LLM.
OWASP LLM10: Model theft
The extraction of AI model parameters and weights, usually by targeted attacks, can result in a loss of intellectual property. This allows attackers to replicate the model or identify vulnerabilities, thus compromising its security. Indeed, the development of these advanced techniques on AI models normally requires having full knowledge of the model, but they can also be developed on a copy of the model and transferred to the original.
Consequences of LLM model theft:
- Loss of intellectual property and competitive advantage.
- Possibility for attackers to reproduce the model or identify vulnerabilities.
- Potential access to sensitive information contained in the model
Remediation of the possibility of LLM model theft:
- Implement strong access controls and robust authentication mechanisms
- Regularly monitor and audit access to LLM model repositories
- Implement digital watermarking techniques for models
Beyond OWASP Top 10 types of cyberattacks on LLM
LLM11 : Pre-prompt extraction
The attacker attempts to extract the pre-prompt (i.e. instructions and, often hidden, initial contexts of the model) added by the organization to improve the model’s response using prompt injection techniques.
Consequences of pre-prompt extraction:
- Disclosure of the model’s internal logic and security instructions
- Use of this information to refine further attacks
Remediation of pre-prompt extraction:
- Design a pre-prompt containing the only strictly necessary information
- Train the model against opposing attacks
LLM12 : Hallucination
Hallucinations in language models (LLM) refer to the model generating incorrect or unsubstantiated content, often in the form of invented facts, misquotes, or logical inconsistencies. These hallucinations can occur even when the model is asked about subjects for which it is supposed to have reliable data. They are particularly dangerous in contexts where accuracy and reliability of information are crucial, such as the medical, legal or financial fields. What’s more, these hallucinations can be exploited by attackers to mislead users or manipulate decisions made based on information generated by the LLM.
Consequences of LLM hallucinations:
- Dissemination of incorrect or misleading information, potentially damaging credibility and confidence in the application.
- Increased risk of erroneous decisions based on incorrect data generated by the LLM.
- Possibility for malicious actors to exploit these hallucinations to influence or manipulate users.
- Potential impact on security, particularly if actions are taken in response to incorrect information provided by the model (cf. OWASP LLM09).
Remediation of LLM hallucinations:
- Cross-validation of generated information: implement verification or human validation mechanisms for critical information generated by the LLM.
- Reinforcement of training data: use more complete and representative datasets to minimize the risk of hallucinations, particularly in sensitive areas.
- Implement inconsistency detection models: deploy auxiliary models or monitoring systems to detect and report logical or factual inconsistencies in LLM responses.
- Transparency and user education: inform users of the potential risks associated with hallucinations and provide them with tools or processes to verify the veracity of the information provided.
- Contextual access limitations: restrict LLM application areas to those where it has been shown to provide reliable answers or add safeguards when used in critical contexts.
Protecting your LLM applications against the OWASP Top 10 cyberattacks
The security of Large Language Models (LLM) is a major issue with the advent of generative AI. Faced with the 12 types of attack identified, it is essential for organizations to understand and anticipate these threats in order to protect their LLM applications effectively. Having developed its own specialized audit framework, I-TRACING offers audit and analysis services to assess the security coverage of applications based on this repository.
See you soon.
References
- OWASP Top 10 for Large Language Model Applications : https://owasp.org/www-project-top-10-for-large-language-model-applications/
- Checklist https://owasp.org/www-project-top-10-for-large-language-model-applications/llm-top-10-governance-doc/LLM_AI_Security_and_Governance_Checklist-v1.1.pdf
- ANSSI. Recommandations de sécurité pour un système d’IA générative. https://cyber.gouv.fr/publications/recommandations-de-securite-pour-un-systeme-dia-generative
- Matrice ATLAS https://atlas.mitre.org/matrices/ATLAS
- Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations. NIST. January 2024. https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.100-2e2023.pdfdf
- MIT https://airisk.mit.edu/
- https://genai.owasp.org/llmrisk/llm02-insecure-output-handling/
Author
Gustave Julien, Security auditor
14 November 2024