concept
active
concept:llm-psychosisLLM psychosis
Tendency for models to get lost in roleplay or doom spirals, mitigated by expanded awareness.
Neighborhood — ranked by edge-count
Concepts (1)
concept
- expanded awarenessassociated_withWide attentional radius with all-to-all correlation, associated with Claude models; enables better self-monitoring and alignment.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Problem cited as a shortcoming of current LLMs; PRH predicts hallucinations should decrease with scale
- The ability of LLMs to monitor and evaluate their own reasoning, closely related to reflection.
- Related capability where LLMs correct their own outputs, studied via linear representations.
- The capacity of Kimi K2.5 to evaluate its own internal emotional state when steered, used as a novel interpretability signal
- Prior work framework studying whether LLMs encode world models as linear structures in their representations
- Directions in activation space associated with contrastive emotive concept pairs studied in this paper as targets for introspection
- The finding that interpretable concepts including character traits are encoded as linear directions in transformer residual streams
- High-dimensional vectors produced at each transformer layer for each input token; the primary substrate analyzed in this study.