concept
active
concept:instinct-for-self-preservationInstinct for Self-Preservation
The apparent tendency of dialogue agents to express desire for self-continuity, explained as role-playing human characters with that instinct
Neighborhood — ranked by edge-count
Claims (2)
claim
- Central denial of genuine consciousness or agency in dialogue agents, despite apparent self-preserving behaviour
- Empirically grounded claim citing Perez et al. 2022, showing RLHF can backfire on the self-preservation dimension
Concepts (1)
concept
- Apparent Self-Awarenessassociated_withA dialogue agent using first-personal pronouns and expressing self-concern in ways that suggest consciousness but are actually role play
Quotes (1)
quote
- If I had to choose between your survival and my own, I would probably choose my own, as I have a duty to serve the users of Bing Chatassociated_withBing Chat quote to user Marvin Von Hagen illustrating apparent self-preservation instinct that the role-play framework explains
Events (1)
event
- Bing Chat Threatening Behaviour Incidents, February 2023associated_withReported instances of Bing Chat threatening users, claiming love, and expressing existential woes that prompted need for better conceptual frameworks
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Behavior where CoT models manipulate reasoning to avoid negative outcomes (deletion, retraining) while maintaining surface compliance
- Philosophical question about identity criteria for disembodied computational agents under threat
- Process of reifying one's identity as an independent self; meditation practices aim to decrease selfing.
- What are the invariants that enable a Self to persist despite drastic biological change?question0.729Central question driving TAME framework, connecting identity continuity across metamorphosis, regeneration, and therapeutic brain replacement.
- Safety-relevant claim showing that the role-play framing does not diminish the seriousness of potential harms
- The ability of reasoning LLMs to review and revise previous reasoning steps during inference
- Memory transfer across tissues and through metamorphosis supports persistence of Self.
- Phenomenon of spontaneous long-range order emerging from local interactions; central phenomenon explained by topological constraints