concept
active
concept:emergent-introspective-awareness-in-llmsEmergent Introspective Awareness in LLMs
Lindsey 2026 paper finding that models can articulate content of injected activation patterns; supports claim about self-knowledge representations
Neighborhood — ranked by edge-count
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Secondary question; paper demonstrates introspection but explicitly avoids pinning down specific mechanistic explanation, noting mechanisms could be shallow and specialized.
- Related work demonstrating LLM introspective capabilities with scale-dependent pattern paralleling ESR
- Prior work documenting abrupt capability changes under scale; UCCT provides a measurable predictor for when they occur
- The central concept: the ability of a model to access and report on its internal states, as defined by the paper's criteria.
- Prior framework claiming frontier LLMs can detect and name injected concepts, interpreted as nascent self-awareness
- The capacity of Kimi K2.5 to evaluate its own internal emotional state when steered, used as a novel interpretability signal
- Most capable models (Opus 4, 4.1) show greatest introspective awareness; trend suggests introspection aided by improvements in model intelligence.