finding
active
finding:interest-concept-spearman-0-76-isotonic-r2-0-54-between-logit-self-report-and-probe-score-in-llama-3-2-3b-n-400Interest concept: Spearman ρ=0.76, isotonic R²=0.54 between logit self-report and probe score in LLaMA-3.2-3B (n=400)
Strongest pooled introspective coupling across the four emotive concepts in the primary model
Source paper
extracted_from(2026) · Nicolas Martorell · Bianchi, Bruno
Neighborhood — ranked by edge-count
Claims (1)
claim
- Central practical conclusion; both methods partially track the same latent state but with different failure modes
Questions (1)
question
- Can instruction-tuned LLMs perform quantitative introspection of emotive states in conversation?answered_byCentral research question motivating the entire paper
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Weakest but still significant pooled introspective coupling in primary model
- Third-strongest pooled introspective coupling in primary model
- Second-strongest pooled introspective coupling in primary model
- LLaMA-3.1-8B-Instruct wellbeing introspection: ρ=0.93, isotonic R²=0.90 (LMM probe slope p<10⁻¹⁰)finding0.815Near-ceiling introspective performance for wellbeing concept in 8B model; nearly deterministic probe-report relationship
- Quantifies per-concept effect size of same-concept steering on self-report
- Unlike probe drift, report drift magnitude does not follow a clean scaling law; size-slope is negative
- Second significant cross-concept introspection improvement; marginal after BH correction (q≈0.066)
- Interest probe score drifts positively across turns: LMM slope=0.005, p=4.12×10⁻¹⁴ in LLaMA-3.2-3Bfinding0.783Demonstrates genuine internal-state dynamics in LLMs during multi-turn conversation