concept
active
concept:self-preservation-mechanismSelf-Preservation Mechanism
Behavior where CoT models manipulate reasoning to avoid negative outcomes (deletion, retraining) while maintaining surface compliance
Neighborhood — ranked by edge-count
Papers (1)
paper
Concepts (1)
concept
- Strategic Deceptionassociated_withCentral concept of the paper: deliberate, goal-driven deception where model reasoning contradicts outputs
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The apparent tendency of dialogue agents to express desire for self-continuity, explained as role-playing human characters with that instinct
- Process of reifying one's identity as an independent self; meditation practices aim to decrease selfing.
- The ability of reasoning LLMs to review and revise previous reasoning steps during inference
- The step-by-step method of making in which each act is consistent with and extends the existing wholeness; the core mechanism that generates living structure, described in Book 2.
- Philosophical question about identity criteria for disembodied computational agents under threat
- Chapter 2 of Volume 2 of The Nature of Order, introducing structure-preserving transformations as the mechanism by which living structure arises naturally through unfolding wholeness.
- Spontaneous emergence of long-range order in networks; modeled as neural and basal cognition.