claim
active
claim:when-probe-and-self-report-agree-and-move-together-causally-confidence-in-both-increases-as-evidence-they-track-the-same-underlying-state

When probe and self-report agree and move together causally, confidence in both increases as evidence they track the same underlying state

Convergent validity logic applied to LLM interpretability; probes validate self-reports and vice versa

Source paper

extracted_from
Quantitative Introspection in Language Models: Tracking Emotive States Across Conversation
(2026) · Nicolas Martorell · Bianchi, Bruno

Neighborhood — ranked by edge-count

Concepts (1)

concept
  • Framework borrowed from human metacognition research: when probe and self-report agree, confidence in both increases as they partially track the same underlying state

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.