LLM Jailbreaking

Bypassing AI safeguards to trigger forbidden outputs

LegalModelsRiskSecurity
Back to Glossary
Updated 2 May 2025

Definition

A technique to bypass safeguards in language models, often to trigger forbidden or unsafe outputs. Prompting a model with adversarial instructions to discuss illegal activities.