claim

active

claim:this-introspective-capacity-is-highly-unreliable-and-context-dependent-in-today-s-models

This introspective capacity is highly unreliable and context-dependent in today's models

A caveat qualifying the main claim.

Source paper

extracted_from

Emergent Introspective Awareness in Large Language Models

(2026) · Lindsey, Jack

Neighborhood — ranked by edge-count

Communities (3)

community

Mechanistic interpretability & model evaluation
members_of
Spans attention head decomposition, benchmark awareness, and genomic pathogenicity prediction via neural models.
Mechanistic introspection in language models
members_of
Empirical investigation of how LMs access and report internal states across layers, using concept injection and thought detection on Claude models.
LLM functional introspective awareness
members_of
Empirical probing of language models' ability to detect and report their own internal concept representations

Concepts (1)

concept

Introspective awareness
about
The central concept: the ability of a model to access and report on its internal states, as defined by the paper's criteria.

Claims (1)

claim

Introspective awareness correlates with overall model capability
associated_with
Most capable models (Opus 4, 4.1) show greatest introspective awareness; trend suggests introspection aided by improvements in model intelligence.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

We stress that in today’s models, this capacity is highly unreliable and context-dependent; however, it may continue to develop with further improvements to model capabilities.quote0.862
Caveat and forward-looking statement from the abstract.
Introspective capacity is present from the first conversation turn, not requiring multi-turn context to emergeclaim0.850
Three of four concepts show significant introspection at turn 1; rules out joint temporal drift as sole explanation
Introspective capabilities have threshold effects requiring very large models; 70B models are barely on the threshold, and independent researchers lack access to larger models.claim0.846
Practical bottleneck explaining why these phenomena are not widely studied.
We hypothesize that introspective capabilities may scale with model size and architecture, including recurrence/looping that extends the integration windowhypothesis0.833
Forward-looking prediction about whether early-layer introspection generalizes to larger models or recurrent architectures
Introspective capabilities may continue to develop with further improvements to model capabilitiesclaim0.830
Forward-looking statement about future models.
Are there examples of models recognizing their introspective capability and then suppressing it?question0.829
Cube Flipper's question prompted by the idea that supernormal capabilities might be hidden.
Either introspection is an emergent capability requiring larger scale, or more stringent controls are needed to test introspection in smaller modelsclaim0.827
Alternative interpretations offered for why binary detection fails in Llama 3.1 8B but frontier models claim success
Introspective capacity may follow a simple monotonic scaling law across all concepts and architectureshypothesis0.821
The paper treats this as possible but unconfirmed; current evidence shows concept-specific scaling only