finding

active

finding:wellbeing-probe-peak-cohen-s-d-3-34-layer-16-p-7-21-10-13-in-llama-3-2-3b

Wellbeing probe: peak Cohen's d=3.34 (layer 16), p=7.21×10⁻¹³ in LLaMA-3.2-3B

Probe validation result confirming wellbeing direction captures meaningful structure

Source paper

extracted_from

Quantitative Introspection in Language Models: Tracking Emotive States Across Conversation

(2026) · Nicolas Martorell · Bianchi, Bruno

Neighborhood — ranked by edge-count

Concepts (1)

concept

Wellbeing probe (sad vs. happy)
supports
One of four emotive concept probes trained; contrastive pair sad/happy with best layer 16 in LLaMA-3.2-3B

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Interest probe: peak Cohen's d=1.67 (layer 14), p=9.45×10⁻⁶ in LLaMA-3.2-3Bfinding0.885
Probe validation result confirming interest direction captures meaningful structure
Qwen 2.5 7B wellbeing probe: peak Cohen's d=3.5finding0.883
Strongest cross-family probe; explains clearer introspection in Qwen than Gemma
Gemma 3 4B wellbeing probe: peak Cohen's d=1.8finding0.880
Weaker cross-family probe; explains weaker introspection in Gemma
Impulsivity probe: peak Cohen's d=3.60 (layer 13), p=3.58×10⁻¹³ in LLaMA-3.2-3Bfinding0.870
Strongest probe validation result; highest Cohen's d among the four concepts
LLaMA-3.1-8B-Instruct wellbeing introspection: ρ=0.93, isotonic R²=0.90 (LMM probe slope p<10⁻¹⁰)finding0.827
Near-ceiling introspective performance for wellbeing concept in 8B model; nearly deterministic probe-report relationship
Wellbeing probe-score drift across turns significant at all three LLaMA scales (slopes=0.006, 0.005, 0.013 for 1B, 3B, 8B; all p<10⁻¹⁰); drift magnitude increases with scalefinding0.827
Internal-state drift generalizes across scales; normalized drift also increases significantly with log(model size)
Wellbeing concept: Spearman ρ=0.68, isotonic R²=0.48 in LLaMA-3.2-3B (n=400, p<10⁻²⁶)finding0.793
Second-strongest pooled introspective coupling in primary model
Wellbeing probe drift is positive in Gemma (ρ=0.34 pooled turn-correlation) and Qwen (ρ=0.24); both p<10⁻⁵finding0.778
Normalized probe-score drift across turns generalizes beyond LLaMA family