finding

active

finding:interest-concept-spearman-0-76-isotonic-r2-0-54-between-logit-self-report-and-probe-score-in-llama-3-2-3b-n-400

Interest concept: Spearman ρ=0.76, isotonic R²=0.54 between logit self-report and probe score in LLaMA-3.2-3B (n=400)

Strongest pooled introspective coupling across the four emotive concepts in the primary model

Source paper

extracted_from

Quantitative Introspection in Language Models: Tracking Emotive States Across Conversation

(2026) · Nicolas Martorell · Bianchi, Bruno

Neighborhood — ranked by edge-count

Claims (1)

claim

Numeric self-report is a viable, complementary black-box tool for monitoring LLM internal emotive states alongside white-box probe methods
supports
Central practical conclusion; both methods partially track the same latent state but with different failure modes

Questions (1)

question

Can instruction-tuned LLMs perform quantitative introspection of emotive states in conversation?
answered_by
Central research question motivating the entire paper

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Focus concept: Spearman ρ=0.40, isotonic R²=0.12 in LLaMA-3.2-3B (n=400, p<10⁻⁵)finding0.896
Weakest but still significant pooled introspective coupling in primary model
Impulsivity concept: Spearman ρ=0.51, isotonic R²=0.31 in LLaMA-3.2-3B (n=400, p<10⁻¹²)finding0.877
Third-strongest pooled introspective coupling in primary model
Wellbeing concept: Spearman ρ=0.68, isotonic R²=0.48 in LLaMA-3.2-3B (n=400, p<10⁻²⁶)finding0.857
Second-strongest pooled introspective coupling in primary model
LLaMA-3.1-8B-Instruct wellbeing introspection: ρ=0.93, isotonic R²=0.90 (LMM probe slope p<10⁻¹⁰)finding0.815
Near-ceiling introspective performance for wellbeing concept in 8B model; nearly deterministic probe-report relationship
Wellbeing same-concept steering: LMM alpha slope=0.19, focus=0.40, interest=0.25, impulsivity=0.067 in LLaMA-3.2-3Bfinding0.790
Quantifies per-concept effect size of same-concept steering on self-report
Logit self-report drift positive for all three LLaMA sizes (turn slopes 0.159, 0.038, 0.141; all p<10⁻²⁰) but does not increase monotonically with scalefinding0.788
Unlike probe drift, report drift magnitude does not follow a clean scaling law; size-slope is negative
Cross-concept steering: impulsivity→interest R² increases from 0.55 (α=-4) to 0.72 (α=+4), ∆R²=0.10, p=0.012 in LLaMA-3.2-3Bfinding0.785
Second significant cross-concept introspection improvement; marginal after BH correction (q≈0.066)
Interest probe score drifts positively across turns: LMM slope=0.005, p=4.12×10⁻¹⁴ in LLaMA-3.2-3Bfinding0.783
Demonstrates genuine internal-state dynamics in LLMs during multi-turn conversation