LLM Introspective Self-Report

The capacity of Kimi K2.5 to evaluate its own internal emotional state when steered, used as a novel interpretability signal

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Emergent Introspective Awareness in LLMsconcept0.810
Lindsey 2026 paper finding that models can articulate content of injected activation patterns; supports claim about self-knowledge representations
LLM Self-Correctionconcept0.795
Related capability where LLMs correct their own outputs, studied via linear representations.
What are the mechanistic bases of introspective awareness in LLMs?question0.777
Secondary question; paper demonstrates introspection but explicitly avoids pinning down specific mechanistic explanation, noting mechanisms could be shallow and specialized.
LLM self-reports about consciousness and moral significance should express degrees of confidence and provide context.claim0.770
Recommendation for companies on LM outputs.
LLM psychosisconcept0.763
Tendency for models to get lost in roleplay or doom spirals, mitigated by expanded awareness.
Reflection in LLMsconcept0.762
The core phenomenon studied: the ability of LLMs to evaluate and revise their own reasoning.
Introspective fidelityconcept0.761
Isotonic R² measuring fraction of variance in self-report explained by probe score under monotonicity assumption; the paper's primary fidelity metric
Self-reportconcept0.759
The model's verbal description of its internal state, which may be accurate or confabulated.