concept
active
concept:two-component-model-of-introspective-abilityTwo-component model of introspective ability
Conceptual distinction between (i) information internally available about a state and (ii) capacity to transform that signal into precise output reports
Neighborhood — ranked by edge-count
Thinkers (2)
thinker
- Stephen M. Flemingstudies
- Annika BoldtstudiesStudied partially overlapping neural correlates of metacognitive monitoring and control; cited for two-component introspection model
Communities (1)
community
- LLM Introspectionextends
Concepts (1)
concept
- Introspective fidelityextendsIsotonic R² measuring fraction of variance in self-report explained by probe score under monotonicity assumption; the paper's primary fidelity metric
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- A caveat qualifying the main claim.
- Introspective capabilities may continue to develop with further improvements to model capabilitiesclaim0.802Forward-looking statement about future models.
- Conceptual distinction motivated by entropy analyses showing probe and report entropy can diverge under steering
- Forward-looking prediction about whether early-layer introspection generalizes to larger models or recurrent architectures
- Most capable models (Opus 4, 4.1) show greatest introspective awareness; trend suggests introspection aided by improvements in model intelligence.
- The novel framework introduced in the paper: an HMM-based pain-belief signal integrated into the reward function to drive exploration
- The capacity of a model to self-report on its internal emotional state when its SAE features are steered, used here as a measurement tool
- Secondary research question addressed through cross-concept steering experiments