claim
active
claim:introspective-ability-can-be-decomposed-into-i-information-available-about-internal-state-and-ii-capacity-to-transform-that-signal-into-precise-output-reportsIntrospective ability can be decomposed into: (i) information available about internal state and (ii) capacity to transform that signal into precise output reports
Conceptual distinction motivated by entropy analyses showing probe and report entropy can diverge under steering
Source paper
extracted_from(2026) · Nicolas Martorell · Bianchi, Bruno
Neighborhood — ranked by edge-count
Findings (2)
finding
- Evidence that improved introspection in focus→wellbeing arises from enriched internal state and report channels simultaneously
- Evidence of a bottleneck between richer internal variation and final report distribution in impulsivity→interest condition
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Alternative interpretations offered for why binary detection fails in Llama 3.1 8B but frontier models claim success
- Secondary research question addressed through cross-concept steering experiments
- A caveat qualifying the main claim.
- Cross-concept steering results; only 2 of 12 non-diagonal cells show significant introspection improvement
- Introspective capabilities may continue to develop with further improvements to model capabilitiesclaim0.804Forward-looking statement about future models.
- Key quantitative characterization of the layer-dependence of partial introspection
- Three of four concepts show significant introspection at turn 1; rules out joint temporal drift as sole explanation
- Conceptual distinction between (i) information internally available about a state and (ii) capacity to transform that signal into precise output reports