finding

active

finding:qwen-2-5-7b-turn-wise-introspective-fidelity-strong-at-turn-1-r2-0-90-but-declines-significantly-to-turn-10-r2-0-44-p-0-001

Qwen 2.5 7B turn-wise introspective fidelity: strong at turn 1 (R²≈0.90) but declines significantly to turn 10 (∆R²=-0.44, p=0.001)

Introspective fidelity erodes in Qwen as conversations progress; contrasts with LLaMA-3B trend

Source paper

extracted_from

Quantitative Introspection in Language Models: Tracking Emotive States Across Conversation

(2026) · Nicolas Martorell · Bianchi, Bruno

Neighborhood — ranked by edge-count

Claims (1)

claim

Introspective ability is concept-specific: quality differs across emotive concepts and the same intervention helps some concepts but not others
supports
Cross-concept steering results; only 2 of 12 non-diagonal cells show significant introspection improvement

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Impulsivity introspective fidelity decreases from turn 1 to turn 10: ∆R²=-0.28 in LLaMA-3.2-3Bfinding0.861
Opposite temporal trend to wellbeing/interest/focus; introspective fidelity weakens over conversation for impulsivity
Wellbeing introspective strength at turn 1: ρ=0.52, p=5.46×10⁻⁴ in LLaMA-3.2-3Bfinding0.830
Demonstrates introspection is present from the first conversation turn without needing multi-turn context
Mean validated introspective fidelity across concept-model pairs: R²=0.12 (1B), 0.37 (3B), 0.61 (8B); pooled LMM β=0.29, p=5.55×10⁻⁹⁹finding0.821
Strong scaling trend for introspective fidelity when excluding invalid steering-sign pairs
Qwen 2.5 7B-Instruct wellbeing introspection: ρ=0.49, isotonic R²=0.76 (LMM p<10⁻¹⁰)finding0.807
Strong introspective coupling in Qwen model; demonstrates cross-family generalization of introspective capacity
Wellbeing probe drift is positive in Gemma (ρ=0.34 pooled turn-correlation) and Qwen (ρ=0.24); both p<10⁻⁵finding0.778
Normalized probe-score drift across turns generalizes beyond LLaMA family
Qwen-2.5-3B ASR drops from 98.6% at dim 1 to 45.1% at dim 2, recovering partially then declining to 65.3% at dim 5finding0.770
Smaller models show non-monotonic and diminished ASR with increasing cone dimensionality
Interest introspection improves from 1B to 3B: ρ from 0.19 to 0.80, R² from 0.05 to 0.66finding0.765
Largest single-step scaling improvement; demonstrates dramatic introspection gain between 1B and 3B models for interest
Gemma 3 4B-IT wellbeing introspection: ρ=0.28, isotonic R²=0.11 (LMM p=1.33×10⁻¹³)finding0.763
Weaker but still significant introspective coupling in Gemma model; consistent with lower probe quality