claim

active

claim:introspective-ability-can-be-decomposed-into-i-information-available-about-internal-state-and-ii-capacity-to-transform-that-signal-into-precise-output-reports

Introspective ability can be decomposed into: (i) information available about internal state and (ii) capacity to transform that signal into precise output reports

Conceptual distinction motivated by entropy analyses showing probe and report entropy can diverge under steering

Source paper

extracted_from

Quantitative Introspection in Language Models: Tracking Emotive States Across Conversation

(2026) · Nicolas Martorell · Bianchi, Bruno

Neighborhood — ranked by edge-count

Findings (2)

finding

Focus→wellbeing steering: both probe entropy (1.09→1.67 bits) and report entropy (0.88→1.69 bits) increase monotonically with α
supports
Evidence that improved introspection in focus→wellbeing arises from enriched internal state and report channels simultaneously
Impulsivity→interest steering: probe entropy increases (LMM slope=0.024, p=2.30×10⁻⁴) but report entropy does not (p=0.11)
supports
Evidence of a bottleneck between richer internal variation and final report distribution in impulsivity→interest condition

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Either introspection is an emergent capability requiring larger scale, or more stringent controls are needed to test introspection in smaller modelsclaim0.826
Alternative interpretations offered for why binary detection fails in Llama 3.1 8B but frontier models claim success
If introspective ability exists, can it be improved?question0.819
Secondary research question addressed through cross-concept steering experiments
This introspective capacity is highly unreliable and context-dependent in today's modelsclaim0.816
A caveat qualifying the main claim.
Introspective ability is concept-specific: quality differs across emotive concepts and the same intervention helps some concepts but not othersclaim0.807
Cross-concept steering results; only 2 of 12 non-diagonal cells show significant introspection improvement
Introspective capabilities may continue to develop with further improvements to model capabilitiesclaim0.804
Forward-looking statement about future models.
Introspective capabilities are confined to early-layer injections (L0-L5) and collapse to chance thereafterclaim0.803
Key quantitative characterization of the layer-dependence of partial introspection
Introspective capacity is present from the first conversation turn, not requiring multi-turn context to emergeclaim0.801
Three of four concepts show significant introspection at turn 1; rules out joint temporal drift as sole explanation
Two-component model of introspective abilityconcept0.800
Conceptual distinction between (i) information internally available about a state and (ii) capacity to transform that signal into precise output reports