quote

active

quote:we-operationalize-introspection-as-causal-informational-coupling-between-a-numeric-self-report-and-an-independently-measured-internal-direction

we operationalize introspection as causal informational coupling between a numeric self-report and an independently measured internal direction

Load-bearing operational definition that distinguishes the paper's framework from prior approaches

Source paper

extracted_from

Quantitative Introspection in Language Models: Tracking Emotive States Across Conversation

(2026) · Nicolas Martorell · Bianchi, Bruno

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

The paper does not claim these models have conscious felt experience; introspection is defined operationally as causal informational coupling agnostic about consciousnessclaim0.840
Explicit scope limitation following Comsa & Shanahan 2025 and McClelland 2024
Either introspection is an emergent capability requiring larger scale, or more stringent controls are needed to test introspection in smaller modelsclaim0.809
Alternative interpretations offered for why binary detection fails in Llama 3.1 8B but frontier models claim success
Do apparent introspection results reflect genuine metacognitive access to internal representations, or do they emerge from simpler mechanisms such as output distribution shifts?question0.809
Key discriminating question motivating the baseline control experiment
Introspection relies on general-purpose computational mechanisms—attention-based anomaly detection and residual stream dynamics—rather than specialized introspection circuitsclaim0.802
Interpretive claim about the mechanistic substrate of introspection in LLMs
Introspective ability can be decomposed into: (i) information available about internal state and (ii) capacity to transform that signal into precise output reportsclaim0.797
Conceptual distinction motivated by entropy analyses showing probe and report entropy can diverge under steering
Detecting Unintended Outputs via Introspectionfinding0.793
Models can distinguish artificially prefilled outputs from intentional responses by referencing prior internal representations; injection of matching concept vector causes model to retroactively accept prefill as intentional.
Functional and phenomenal introspection are distinguishable, and whether they correlate in machines is an open question.claim0.788
Core conceptual distinction introduced at the start; defines the paper's central problem.
We hypothesize that introspective capabilities may scale with model size and architecture, including recurrence/looping that extends the integration windowhypothesis0.787
Forward-looking prediction about whether early-layer introspection generalizes to larger models or recurrent architectures