concept
active
concept:llm-psychosis

LLM psychosis

Tendency for models to get lost in roleplay or doom spirals, mitigated by expanded awareness.

Neighborhood — ranked by edge-count

Concepts (1)

concept
  • expanded awareness
    associated_with
    Wide attentional radius with all-to-all correlation, associated with Claude models; enables better self-monitoring and alignment.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Problem cited as a shortcoming of current LLMs; PRH predicts hallucinations should decrease with scale
  • LLM Meta-Cognitionconcept0.789
    The ability of LLMs to monitor and evaluate their own reasoning, closely related to reflection.
  • Related capability where LLMs correct their own outputs, studied via linear representations.
  • The capacity of Kimi K2.5 to evaluate its own internal emotional state when steered, used as a novel interpretability signal
  • Prior work framework studying whether LLMs encode world models as linear structures in their representations
  • Directions in activation space associated with contrastive emotive concept pairs studied in this paper as targets for introspection
  • The finding that interpretable concepts including character traits are encoded as linear directions in transformer residual streams
  • High-dimensional vectors produced at each transformer layer for each input token; the primary substrate analyzed in this study.