finding
active
finding:impulsivity-introspective-fidelity-decreases-from-turn-1-to-turn-10-r2-0-28-in-llama-3-2-3bImpulsivity introspective fidelity decreases from turn 1 to turn 10: ∆R²=-0.28 in LLaMA-3.2-3B
Opposite temporal trend to wellbeing/interest/focus; introspective fidelity weakens over conversation for impulsivity
Source paper
extracted_from(2026) · Nicolas Martorell · Bianchi, Bruno
Neighborhood — ranked by edge-count
Claims (1)
claim
- Cross-concept steering results; only 2 of 12 non-diagonal cells show significant introspection improvement
Concepts (1)
concept
- Introspective fidelitysupportsIsotonic R² measuring fraction of variance in self-report explained by probe score under monotonicity assumption; the paper's primary fidelity metric
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Introspective fidelity erodes in Qwen as conversations progress; contrasts with LLaMA-3B trend
- LLaMA-3.2-1B impulsivity introspection: ρ=0.21, p<10⁻⁴ (significant but weaker than 3B ρ=0.52)finding0.834Impulsivity shows significant introspection in 1B but declines in 8B; non-monotonic scaling
- Impulsivity→interest: ρ increases from 0.70 (α=-4) to 0.83 (α=+4); R² from 0.46 to 0.69 in LLaMA-3.2-3Bfinding0.831Scatter plot visualization showing strengthened probe-report relationship across alpha range
- Second significant cross-concept introspection improvement; marginal after BH correction (q≈0.066)
- Demonstrates introspection is present from the first conversation turn without needing multi-turn context
- Third-strongest pooled introspective coupling in primary model
- LLaMA-3.1-8B-Instruct wellbeing introspection: ρ=0.93, isotonic R²=0.90 (LMM probe slope p<10⁻¹⁰)finding0.786Near-ceiling introspective performance for wellbeing concept in 8B model; nearly deterministic probe-report relationship
- Strong scaling trend for introspective fidelity when excluding invalid steering-sign pairs