finding
active
finding:gemma-3-4b-it-wellbeing-introspection-0-28-isotonic-r2-0-11-lmm-p-1-33-10-13Gemma 3 4B-IT wellbeing introspection: ρ=0.28, isotonic R²=0.11 (LMM p=1.33×10⁻¹³)
Weaker but still significant introspective coupling in Gemma model; consistent with lower probe quality
Source paper
extracted_from(2026) · Nicolas Martorell · Bianchi, Bruno
Neighborhood — ranked by edge-count
Claims (1)
claim
- Central practical conclusion; both methods partially track the same latent state but with different failure modes
Findings (1)
finding
- Weaker cross-family probe; explains weaker introspection in Gemma
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Strong introspective coupling in Qwen model; demonstrates cross-family generalization of introspective capacity
- LLaMA-3.1-8B-Instruct wellbeing introspection: ρ=0.93, isotonic R²=0.90 (LMM probe slope p<10⁻¹⁰)finding0.841Near-ceiling introspective performance for wellbeing concept in 8B model; nearly deterministic probe-report relationship
- Wellbeing introspection improves from 1B to 3B: ρ from 0.48 to 0.66, R² from 0.26 to 0.45finding0.836Confirms scaling trend for wellbeing concept between smallest and middle model size
- Wellbeing probe drift is positive in Gemma (ρ=0.34 pooled turn-correlation) and Qwen (ρ=0.24); both p<10⁻⁵finding0.808Normalized probe-score drift across turns generalizes beyond LLaMA family
- Demonstrates introspection is present from the first conversation turn without needing multi-turn context
- Strong scaling trend for introspective fidelity when excluding invalid steering-sign pairs
- Second-strongest pooled introspective coupling in primary model
- Interest introspection improves from 1B to 3B: ρ from 0.19 to 0.80, R² from 0.05 to 0.66finding0.786Largest single-step scaling improvement; demonstrates dramatic introspection gain between 1B and 3B models for interest