finding
active
finding:gemma-3-4b-wellbeing-probe-peak-cohen-s-d-1-8Gemma 3 4B wellbeing probe: peak Cohen's d=1.8
Weaker cross-family probe; explains weaker introspection in Gemma
Source paper
extracted_from(2026) · Nicolas Martorell · Bianchi, Bruno
Neighborhood — ranked by edge-count
Findings (1)
finding
- Weaker but still significant introspective coupling in Gemma model; consistent with lower probe quality
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Strongest cross-family probe; explains clearer introspection in Qwen than Gemma
- Probe validation result confirming wellbeing direction captures meaningful structure
- Strongest probe validation result; highest Cohen's d among the four concepts
- Probe validation result confirming interest direction captures meaningful structure
- Wellbeing probe drift is positive in Gemma (ρ=0.34 pooled turn-correlation) and Qwen (ρ=0.24); both p<10⁻⁵finding0.797Normalized probe-score drift across turns generalizes beyond LLaMA family
- Gemma-3-4B-it shows three-stage layer trajectory and S(ℓ) peak despite scale differences in dr and ρdfinding0.772E3 backbone generalization finding for Gemma; validates pattern across diverse architectures
- Identified exception to overall MDS effectiveness; reason remains unexplained as a limitation
- Gemma-2-27B-it deceptive response rate reduced from 100% to 9.36% ± 7.09% after SOO fine-tuningfinding0.753Primary result showing SOO fine-tuning significantly reduces deception in Gemma-2-27B