claim
active
claim:modern-language-models-possess-at-least-a-limited-functional-form-of-introspective-awarenessModern language models possess at least a limited, functional form of introspective awareness
The paper's central interpretive assertion.
Source paper
extracted_from(2026) · Lindsey, Jack
Neighborhood — ranked by edge-count
Findings (5)
finding
- All models exhibit above-baseline representation of the think word when instructed to think about itsupportsIn the intentional control experiment, all tested models show above-zero cosine similarity to the think word's concept vector.
- All tested models could both identify the injected concept and transcribe the input sentence well above random.
- In the injected thoughts experiment, Opus 4.1 succeeds about 20% of the time.
- Control experiment rules out the possibility that concept vectors simply bias the model to answer affirmatively.
- Random vectors at injection strength 8 elicit introspective awareness in 9 out of 100 trialssupportsRandom vectors are less effective, and even then produce introspection at lower rates.
Communities (4)
community
- Spans attention head decomposition, benchmark awareness, and genomic pathogenicity prediction via neural models.
- Empirical investigation of how LMs access and report internal states across layers, using concept injection and thought detection on Claude models.
- LLM functional introspective awarenessmembers_ofEmpirical probing of language models' ability to detect and report their own internal concept representations
- Studying how concept injection and random vectors trigger self-reflective capabilities in LLMs across varying strength parameters.
Concepts (1)
concept
- The central concept: the ability of a model to access and report on its internal states, as defined by the paper's criteria.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Abstract's main conclusion.
- Related work demonstrating LLM introspective capabilities with scale-dependent pattern paralleling ESR
- Central research question animating the paper: distinguishing genuine introspection from illusion through causal manipulation of activations.
- Most capable models (Opus 4, 4.1) show greatest introspective awareness; trend suggests introspection aided by improvements in model intelligence.
- Central open question raised by the paper.
- Speculative question about future developments.
- Key finding about the relationship between capability and introspection.
- Introspective capabilities may continue to develop with further improvements to model capabilitiesclaim0.823Forward-looking statement about future models.