claim
active
claim:numeric-self-report-is-a-viable-complementary-black-box-tool-for-monitoring-llm-internal-emotive-states-alongside-white-box-probe-methodsNumeric self-report is a viable, complementary black-box tool for monitoring LLM internal emotive states alongside white-box probe methods
Central practical conclusion; both methods partially track the same latent state but with different failure modes
Source paper
extracted_from(2026) · Nicolas Martorell · Bianchi, Bruno
Neighborhood — ranked by edge-count
Findings (7)
finding
- Weaker but still significant introspective coupling in Gemma model; consistent with lower probe quality
- Strongest pooled introspective coupling across the four emotive concepts in the primary model
- Strong introspective coupling in Qwen model; demonstrates cross-family generalization of introspective capacity
- Weakest but still significant pooled introspective coupling in primary model
- Third-strongest pooled introspective coupling in primary model
- Controls for probe artifacts; demonstrates self-reports carry information specifically about probe-defined concept directions
- Second-strongest pooled introspective coupling in primary model
Claims (2)
claim
- Skeptical prior work motivating the need to validate self-reports against internal states rather than taking them at face value
- Skeptical prior work motivating validation framework
Questions (1)
question
- Can instruction-tuned LLMs perform quantitative introspection of emotive states in conversation?gatesCentral research question motivating the entire paper
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The coupling between LLM self-report and internal emotive state is causal, not merely correlationalclaim0.825Supported by same-concept steering experiments showing monotonic shifts in self-report with activation steering
- Recommendation for companies on LM outputs.
- Primary tool in human psychometrics for tracking latent internal states; adapted as the core measure in this paper for LLMs
- Prior finding showing scale-dependent self-awareness, consistent with the scale effect observed in the paper's Experiment 1
- Normative-scientific claim about the alignment implications of Experiment 2's findings
- Claim supporting the validity of the probe construction method via cross-validation with self-report
- Explicit scope delimitation that situates the paper's claims within interpretability rather than consciousness science
- The core interpretive question the paper narrows but cannot definitively answer