Adversarial testing to find AI system vulnerabilities
Red teaming in the context of artificial intelligence refers to the systematic practice of simulating adversarial attacks against AI systems to identify vulnerabilities, weaknesses, and potential risks before deployment. No universally agreed definition exists across academic, industry, and regulatory contexts, reflecting the evolving nature of this practice as it adapts from traditional cybersecurity to address the unique challenges posed by AI systems.
The Biden Administration's Executive Order on AI defines AI red teaming as "a structured testing effort to find flaws and vulnerabilities in an AI system, often in a controlled environment and collaboration with developers of AI", noting that it "is most often performed by dedicated 'red teams' that adopt adversarial methods to identify flaws and vulnerabilities, such as harmful or discriminatory outputs from an AI system, unforeseen or undesirable system behaviors, limitations, or potential risks associated with the misuse of the system."
AI red teaming differs from traditional cybersecurity red teaming in scope and methodology. Whilst traditional red teaming focuses primarily on network infiltration and system exploitation, AI red teaming addresses both traditional security concerns and AI-specific risks including algorithmic bias, harmful content generation, model hallucinations, prompt injection attacks, and safety alignment failures.
The term "red teaming" originates from Cold War-era military exercises where designated "red teams" played adversarial roles against "blue teams" to test defence strategies and identify vulnerabilities. The practice was subsequently adopted by the cybersecurity community to describe systematic attacks on computer systems to identify security weaknesses.
AI red teaming emerged as generative AI systems, particularly large language models, became more prevalent and demonstrated both impressive capabilities and concerning vulnerabilities. The practice has evolved from primarily manual testing approaches to incorporate automated tools and frameworks, though human expertise remains crucial for creative attack scenarios and contextual evaluation.
AI red teaming encompasses diverse methodologies including prompt-based testing (attempting to manipulate AI systems through carefully crafted inputs), adversarial example generation (creating inputs designed to fool AI models), system-level attacks (targeting the broader infrastructure surrounding AI models), and social engineering approaches (exploiting human-AI interaction patterns).
Microsoft's AI Red Team, based on experience red teaming over 100 generative AI products, has identified key principles including understanding system capabilities and applications, recognising that effective attacks don't require gradient computation, distinguishing red teaming from safety benchmarking, leveraging automation whilst maintaining human oversight, and acknowledging that AI security is an ongoing rather than completed endeavour.
Academic research has established red teaming as a critical component of AI safety evaluation. Research demonstrates significant effectiveness differences between automated and manual approaches, with automated red teaming achieving approximately 69.5% success rates compared to 47.6% for manual attempts in controlled studies. However, academic sources emphasise that red teaming presents sociotechnical challenges involving values, labour conditions, and psychological impacts on practitioners.
Industry implementations vary significantly across organisations. Google's AI Red Team applies traditional cybersecurity expertise combined with AI-specific knowledge to simulate attacks from various threat actors including nation states and criminal groups. OpenAI conducts both external red teaming campaigns with domain experts and automated red teaming using AI systems to probe their own models. IBM Research has developed red team LLMs specifically designed to identify vulnerabilities in other language models.
Red teaming has gained prominence in AI policy and regulation globally. The EU AI Act, whilst not explicitly mandating red teaming, requires risk assessments and safety evaluations that often involve red teaming methodologies. The Biden Administration's Executive Order positions red teaming as a key component of responsible AI development, particularly for dual-use foundation models.
However, definitional ambiguity creates challenges for regulatory compliance. The Center for Security and Emerging Technology notes that "conflating 'red-teaming' with the broader category of 'AI testing' and failing to establish a common definition of AI red teaming is likely to leave such a requirement wide open for interpretation." This regulatory uncertainty means that organisations may interpret red teaming requirements differently, potentially leading to inconsistent implementation standards.
Red teaming faces several acknowledged limitations. It cannot guarantee comprehensive security coverage, as new attack vectors continuously emerge. The practice requires significant expertise and resources that may not be available to all organisations developing AI systems. Additionally, red teaming results may not generalise across different deployment contexts or user populations.
Academic research highlights that "AI red teaming is not safety benchmarking" and cannot replace systematic measurement and evaluation processes. The dynamic nature of AI systems means that vulnerabilities may emerge through model updates, fine-tuning, or changed deployment contexts even after successful red teaming exercises.
For legal practitioners, red teaming represents both a due diligence practice and a potential source of liability evidence. Organisations that conduct thorough red teaming demonstrate proactive risk management, whilst those that deploy AI systems without adequate testing may face increased liability exposure in case of harmful outcomes.
The practice raises questions about disclosure obligations, intellectual property protection for red teaming methodologies, and the extent to which red teaming results should be shared with regulators or the broader AI safety community. As red teaming becomes more standardised, failure to conduct appropriate testing may constitute negligence in AI system deployment.
Biden Administration Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence, Section 3(d) (2023). Bullwinkel, B., et al. "Lessons From Red Teaming 100 Generative AI Products." arXiv:2501.07238 (2025). Center for Security and Emerging Technology, "What Does AI Red-Teaming Actually Mean?" (2023). Gillespie, T., et al. "AI red-teaming is a sociotechnical challenge: on values, labor, and harms." arXiv:2412.09751 (2024). Google AI Red Team Report (2023). IBM Research, "What is red teaming for generative AI?" (2025). Microsoft, "Planning red teaming for large language models (LLMs) and their applications." OpenAI, "Advancing red teaming with people and AI."