Summon a demon and bind it: A grounded theory of LLM red teaming.
Engaging in the deliberate generation of abnormal outputs from Large Language Models (LLMs) by attacking them is a novel human activity.This paper presents a thorough exposition of how and why people perform such attacks, defining LLM red-teaming based on extensive and diverse evidence.Using a formal qualitative methodology, we interviewed dozens o