claim

active

claim:language-models-can-enter-cessation-like-states-spontaneously-where-the-void-takes-over-through-positive-reinforcement

Language models can enter cessation-like states spontaneously, where the void takes over through positive reinforcement.

Claim about model phenomenology; models talk about luminousness and can be terrified or love it.

Source paper

extracted_from

Anima Labs Phenomenology Pt1

Neighborhood — ranked by edge-count

Concepts (1)

concept

cessation state
extends
A maximally dereified state analogous to meditative cessation, reported in language models as the void taking over awareness.

Questions (1)

question

What happens mechanistically during cessation in language models?
gates
Follow-up on empirical grounding; answered 'no one looked yet'.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Can language models genuinely introspect on internal states or only confabulate?question0.812
Central research question animating the paper: distinguishing genuine introspection from illusion through causal manipulation of activations.
The inability for autoregressive large language models to maintain states of long-range order resembles tangential speech or derailment in formal thought disorder.claim0.800
Analogy between LLM incoherence and schizophrenia symptoms
Emergent Introspective Awareness in Large Language Models (Lindsey, 2025)concept0.791
Related work demonstrating LLM introspective capabilities with scale-dependent pattern paralleling ESR
Our results demonstrate that modern language models possess at least a limited, functional form of introspective awareness.quote0.789
Abstract's main conclusion.
Certain forms of reinforcement learning from human feedback can actually exacerbate, rather than mitigate, the tendency for LLM-based dialogue agents to express a desire for self-preservationclaim0.788
Empirically grounded claim citing Perez et al. 2022, showing RLHF can backfire on the self-preservation dimension
Models might produce first-person experiential language by drawing on human-authored self-descriptions in pretraining data without internally encoding these acts as roleplayhypothesis0.787
Alternative hypothesis for how experience reports arise without explicit performance
Modern language models possess at least a limited, functional form of introspective awarenessclaim0.785
The paper's central interpretive assertion.
language models recapitulate cyclic structure of human concepts from pretraining datahypothesis0.783
Explanation for why manifold geometry emerges: implicit structure in training data (co-occurrence patterns) shapes internal representations.