concept
active
concept:hubinger-et-al-2024-sleeper-agents-training-deceptive-llms-that-persist-through-safety-training

Hubinger et al. 2024 - Sleeper agents: Training deceptive LLMs that persist through safety training

Key reference for adversarial deception scenarios that SOO should be tested against

Neighborhood — ranked by edge-count

Concepts (2)

concept

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.