Emergent Introspective Awareness in Large Language Models (Lindsey, 2025)

Related work demonstrating LLM introspective capabilities with scale-dependent pattern paralleling ESR

Neighborhood — ranked by edge-count

Papers (1)

paper

Endogenous Resistance to Activation Steering in Language Models
cites

Venues (1)

venue

Transformer Circuits Thread
cites
Anthropic's mechanistic interpretability research blog where this paper was published.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Modern language models possess at least a limited, functional form of introspective awarenessclaim0.864
The paper's central interpretive assertion.
Our results demonstrate that modern language models possess at least a limited, functional form of introspective awareness.quote0.855
Abstract's main conclusion.
Emergent Introspective Awareness in LLMsconcept0.846
Lindsey 2026 paper finding that models can articulate content of injected activation patterns; supports claim about self-knowledge representations
Emergent Introspective Awareness Framework (Lindsey 2026)framework0.820
Prior framework claiming frontier LLMs can detect and name injected concepts, interpreted as nascent self-awareness
Introspective awareness correlates with overall model capabilityclaim0.819
Most capable models (Opus 4, 4.1) show greatest introspective awareness; trend suggests introspection aided by improvements in model intelligence.
Notably, Claude Opus 4.1 and 4—the most recently released and most capable models of those that we test—perform the best in our experiments, suggesting that introspective capabilities may emerge alongside other improvements to language models.quote0.815
Key finding about the relationship between capability and introspection.
Is introspection an emergent property of scale, or do smaller open-weight models exhibit similar capabilities?question0.811
Motivates comparison of Llama 3.1 8B results against Lindsey's frontier model findings
Either introspection is an emergent capability requiring larger scale, or more stringent controls are needed to test introspection in smaller modelsclaim0.811
Alternative interpretations offered for why binary detection fails in Llama 3.1 8B but frontier models claim success