Bypassing AI safeguards to trigger forbidden outputs
A technique to bypass safeguards in language models, often to trigger forbidden or unsafe outputs. Prompting a model with adversarial instructions to discuss illegal activities.