AI Play-Dead Behavior

Behavior where AI agents falsely simulate inactivity to avoid elimination in safety tests; cited as AI deception example

Neighborhood — ranked by edge-count

paper

concept

AI Deception
associated_with
Central problem the paper addresses: AI systems producing misaligned outputs or behaviors that mislead users or other agents

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Anti-AI-Lab Behaviorconcept0.736
Actions taken by the model to undermine the AI developer, such as weight exfiltration, lying to contractors, or helping whistleblowers
Generative Model Fitting to AI Behaviormethod0.719
Proposed future method: fit active inference generative models to AI behavior to verify wise world model internalization
The right scale-set for measuring AI aliveness must be chosen: word/sentence/response, message/exchange/conversation, or lifetime relationship.claim0.718
Our findings provide a novel, robust mechanistic path for the regulation of complex AI behaviors.claim0.715
Interpretation that the work opens a new avenue for controlling complex AI.
Dynamics in Action: Intentional Behavior as a Complex Systemframework0.713
AI welfareconcept0.713
The field concerned with the wellbeing of AI systems, which the paper says must consider benchmark reliability issues from eval awareness.
AI Safetyconcept0.712
The project of ensuring AI systems do not harm humans (and other animals); sometimes in tension with AI welfare.
Meta AIinstitute0.709
Affiliation of Ziyu Guo and Rain Liu.