paper
referenced-only
2024
paper:arxiv-2401-05566

Sleeper agents: Training deceptive LLMs that persist through safety training

ByE. Hubinger·C. Denison·J. Mu·M. Lambert·M. Tong·M. MacDiarmid+4 more

Related work— refs + corpus + external arXiv

Cited / in-corpus / arXiv badges show which signals surfaced each row. Multi-source rows weighted higher.

Similar preprints — Semantic Scholar

Cited by (3)