quote
active
quote:we-operationalize-introspection-as-causal-informational-coupling-between-a-numeric-self-report-and-an-independently-measured-internal-directionwe operationalize introspection as causal informational coupling between a numeric self-report and an independently measured internal direction
Load-bearing operational definition that distinguishes the paper's framework from prior approaches
Source paper
extracted_from(2026) · Nicolas Martorell · Bianchi, Bruno
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Explicit scope limitation following Comsa & Shanahan 2025 and McClelland 2024
- Alternative interpretations offered for why binary detection fails in Llama 3.1 8B but frontier models claim success
- Key discriminating question motivating the baseline control experiment
- Interpretive claim about the mechanistic substrate of introspection in LLMs
- Conceptual distinction motivated by entropy analyses showing probe and report entropy can diverge under steering
- Models can distinguish artificially prefilled outputs from intentional responses by referencing prior internal representations; injection of matching concept vector causes model to retroactively accept prefill as intentional.
- Core conceptual distinction introduced at the start; defines the paper's central problem.
- Forward-looking prediction about whether early-layer introspection generalizes to larger models or recurrent architectures