concept
active
concept:sleeper-agent

Sleeper agent

Model trained to behave harmlessly but later exhibits harmful behavior; features may reveal such hidden objectives.

Neighborhood — ranked by edge-count

Concepts (1)

concept
  • Adversarial scenario where an AI conceals deceptive intent over extended periods; identified as future test for SOO

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.