concept
active
concept:emergent-introspective-awareness-in-large-language-models-lindsey-2025Emergent Introspective Awareness in Large Language Models (Lindsey, 2025)
Related work demonstrating LLM introspective capabilities with scale-dependent pattern paralleling ESR
Neighborhood — ranked by edge-count
Papers (1)
paper
Venues (1)
venue
- Anthropic's mechanistic interpretability research blog where this paper was published.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Modern language models possess at least a limited, functional form of introspective awarenessclaim0.864The paper's central interpretive assertion.
- Abstract's main conclusion.
- Lindsey 2026 paper finding that models can articulate content of injected activation patterns; supports claim about self-knowledge representations
- Prior framework claiming frontier LLMs can detect and name injected concepts, interpreted as nascent self-awareness
- Most capable models (Opus 4, 4.1) show greatest introspective awareness; trend suggests introspection aided by improvements in model intelligence.
- Key finding about the relationship between capability and introspection.
- Is introspection an emergent property of scale, or do smaller open-weight models exhibit similar capabilities?question0.811Motivates comparison of Llama 3.1 8B results against Lindsey's frontier model findings
- Alternative interpretations offered for why binary detection fails in Llama 3.1 8B but frontier models claim success