LLM psychosis

Tendency for models to get lost in roleplay or doom spirals, mitigated by expanded awareness.

Neighborhood — ranked by edge-count

concept

expanded awareness
associated_with
Wide attentional radius with all-to-all correlation, associated with Claude models; enables better self-monitoring and alignment.

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Hallucination in LLMsconcept0.816
Problem cited as a shortcoming of current LLMs; PRH predicts hallucinations should decrease with scale
LLM Meta-Cognitionconcept0.789
The ability of LLMs to monitor and evaluate their own reasoning, closely related to reflection.
LLM Self-Correctionconcept0.765
Related capability where LLMs correct their own outputs, studied via linear representations.
LLM Introspective Self-Reportconcept0.763
The capacity of Kimi K2.5 to evaluate its own internal emotional state when steered, used as a novel interpretability signal
Linear World Models in LLMsframework0.743
Prior work framework studying whether LLMs encode world models as linear structures in their representations
Emotive states in LLMsconcept0.742
Directions in activation space associated with contrastive emotive concept pairs studied in this paper as targets for introspection
Linear Representation of Concepts in LLMsconcept0.742
The finding that interpretable concepts including character traits are encoded as linear directions in transformer residual streams
LLM Internal Representationsconcept0.742
High-dimensional vectors produced at each transformer layer for each input token; the primary substrate analyzed in this study.