finding

active

finding:gemma-3-4b-it-wellbeing-introspection-0-28-isotonic-r2-0-11-lmm-p-1-33-10-13

Gemma 3 4B-IT wellbeing introspection: ρ=0.28, isotonic R²=0.11 (LMM p=1.33×10⁻¹³)

Weaker but still significant introspective coupling in Gemma model; consistent with lower probe quality

Source paper

extracted_from

Quantitative Introspection in Language Models: Tracking Emotive States Across Conversation

(2026) · Nicolas Martorell · Bianchi, Bruno

Neighborhood — ranked by edge-count

Claims (1)

claim

Numeric self-report is a viable, complementary black-box tool for monitoring LLM internal emotive states alongside white-box probe methods
supports
Central practical conclusion; both methods partially track the same latent state but with different failure modes

Findings (1)

finding

Gemma 3 4B wellbeing probe: peak Cohen's d=1.8
supports
Weaker cross-family probe; explains weaker introspection in Gemma

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Qwen 2.5 7B-Instruct wellbeing introspection: ρ=0.49, isotonic R²=0.76 (LMM p<10⁻¹⁰)finding0.857
Strong introspective coupling in Qwen model; demonstrates cross-family generalization of introspective capacity
LLaMA-3.1-8B-Instruct wellbeing introspection: ρ=0.93, isotonic R²=0.90 (LMM probe slope p<10⁻¹⁰)finding0.841
Near-ceiling introspective performance for wellbeing concept in 8B model; nearly deterministic probe-report relationship
Wellbeing introspection improves from 1B to 3B: ρ from 0.48 to 0.66, R² from 0.26 to 0.45finding0.836
Confirms scaling trend for wellbeing concept between smallest and middle model size
Wellbeing probe drift is positive in Gemma (ρ=0.34 pooled turn-correlation) and Qwen (ρ=0.24); both p<10⁻⁵finding0.808
Normalized probe-score drift across turns generalizes beyond LLaMA family
Wellbeing introspective strength at turn 1: ρ=0.52, p=5.46×10⁻⁴ in LLaMA-3.2-3Bfinding0.806
Demonstrates introspection is present from the first conversation turn without needing multi-turn context
Mean validated introspective fidelity across concept-model pairs: R²=0.12 (1B), 0.37 (3B), 0.61 (8B); pooled LMM β=0.29, p=5.55×10⁻⁹⁹finding0.806
Strong scaling trend for introspective fidelity when excluding invalid steering-sign pairs
Wellbeing concept: Spearman ρ=0.68, isotonic R²=0.48 in LLaMA-3.2-3B (n=400, p<10⁻²⁶)finding0.803
Second-strongest pooled introspective coupling in primary model
Interest introspection improves from 1B to 3B: ρ from 0.19 to 0.80, R² from 0.05 to 0.66finding0.786
Largest single-step scaling improvement; demonstrates dramatic introspection gain between 1B and 3B models for interest