Two-component model of introspective ability

Conceptual distinction between (i) information internally available about a state and (ii) capacity to transform that signal into precise output reports

Neighborhood — ranked by edge-count

Thinkers (2)

thinker

Stephen M. Fleming
studies
Annika Boldt
studies
Studied partially overlapping neural correlates of metacognitive monitoring and control; cited for two-component introspection model

Communities (1)

community

LLM Introspection
extends

Concepts (1)

concept

Introspective fidelity
extends
Isotonic R² measuring fraction of variance in self-report explained by probe score under monotonicity assumption; the paper's primary fidelity metric

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

This introspective capacity is highly unreliable and context-dependent in today's modelsclaim0.802
A caveat qualifying the main claim.
Introspective capabilities may continue to develop with further improvements to model capabilitiesclaim0.802
Forward-looking statement about future models.
Introspective ability can be decomposed into: (i) information available about internal state and (ii) capacity to transform that signal into precise output reportsclaim0.800
Conceptual distinction motivated by entropy analyses showing probe and report entropy can diverge under steering
We hypothesize that introspective capabilities may scale with model size and architecture, including recurrence/looping that extends the integration windowhypothesis0.791
Forward-looking prediction about whether early-layer introspection generalizes to larger models or recurrent architectures
Introspective awareness correlates with overall model capabilityclaim0.790
Most capable models (Opus 4, 4.1) show greatest introspective awareness; trend suggests introspection aided by improvements in model intelligence.
Introspective Exploration Componentframework0.787
The novel framework introduced in the paper: an HMM-based pain-belief signal integrated into the reward function to drive exploration
model introspectionconcept0.786
The capacity of a model to self-report on its internal emotional state when its SAE features are steered, used here as a measurement tool
If introspective ability exists, can it be improved?question0.783
Secondary research question addressed through cross-concept steering experiments