claim
active
claim:this-introspective-capacity-is-highly-unreliable-and-context-dependent-in-today-s-modelsThis introspective capacity is highly unreliable and context-dependent in today's models
A caveat qualifying the main claim.
Source paper
extracted_from(2026) · Lindsey, Jack
Neighborhood — ranked by edge-count
Communities (3)
community
- Spans attention head decomposition, benchmark awareness, and genomic pathogenicity prediction via neural models.
- Empirical investigation of how LMs access and report internal states across layers, using concept injection and thought detection on Claude models.
- LLM functional introspective awarenessmembers_ofEmpirical probing of language models' ability to detect and report their own internal concept representations
Concepts (1)
concept
- The central concept: the ability of a model to access and report on its internal states, as defined by the paper's criteria.
Claims (1)
claim
- Most capable models (Opus 4, 4.1) show greatest introspective awareness; trend suggests introspection aided by improvements in model intelligence.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Caveat and forward-looking statement from the abstract.
- Three of four concepts show significant introspection at turn 1; rules out joint temporal drift as sole explanation
- Practical bottleneck explaining why these phenomena are not widely studied.
- Forward-looking prediction about whether early-layer introspection generalizes to larger models or recurrent architectures
- Introspective capabilities may continue to develop with further improvements to model capabilitiesclaim0.830Forward-looking statement about future models.
- Are there examples of models recognizing their introspective capability and then suppressing it?question0.829Cube Flipper's question prompted by the idea that supernormal capabilities might be hidden.
- Alternative interpretations offered for why binary detection fails in Llama 3.1 8B but frontier models claim success
- Introspective capacity may follow a simple monotonic scaling law across all concepts and architectureshypothesis0.821The paper treats this as possible but unconfirmed; current evidence shows concept-specific scaling only