finding

active

finding:cross-concept-steering-focus-wellbeing-r2-increases-from-0-30-4-to-0-76-4-r2-0-30-p-0-001-in-llama-3-2-3b

Cross-concept steering: focus→wellbeing R² increases from 0.30 (α=-4) to 0.76 (α=+4), ∆R²=0.30, p<0.001 in LLaMA-3.2-3B

Strongest cross-concept introspection improvement; survives BH correction (q≈0.011)

Source paper

extracted_from

Quantitative Introspection in Language Models: Tracking Emotive States Across Conversation

(2026) · Nicolas Martorell · Bianchi, Bruno

Neighborhood — ranked by edge-count

Claims (2)

claim

Basal introspective performance is not always maximal and some failure cases are solvable by representational intervention rather than reflecting complete absence of introspective capacity
supports
Supported by cross-concept steering finding that focus→wellbeing steering dramatically improves introspection
Introspective ability is concept-specific: quality differs across emotive concepts and the same intervention helps some concepts but not others
supports
Cross-concept steering results; only 2 of 12 non-diagonal cells show significant introspection improvement

Hypotheses (1)

hypothesis

There may exist a global introspective faculty or steering direction that improves introspection uniformly across all concepts
associated_with
Framed as an open problem; current evidence only points to local pair-specific improvement

Questions (1)

question

If introspective ability exists, can it be improved?
answered_by
Secondary research question addressed through cross-concept steering experiments

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Cross-concept steering: impulsivity→interest R² increases from 0.55 (α=-4) to 0.72 (α=+4), ∆R²=0.10, p=0.012 in LLaMA-3.2-3Bfinding0.895
Second significant cross-concept introspection improvement; marginal after BH correction (q≈0.066)
Focus→wellbeing: ρ increases from 0.42 (α=-4) to 0.85 (α=+4); R² from 0.34 to 0.75 in LLaMA-3.2-3Bfinding0.870
Scatter plot visualization of the dramatic tightening of probe-report relationship at extreme steering settings
Wellbeing same-concept steering: LMM alpha slope=0.19, focus=0.40, interest=0.25, impulsivity=0.067 in LLaMA-3.2-3Bfinding0.842
Quantifies per-concept effect size of same-concept steering on self-report
Focus→wellbeing steering: both probe entropy (1.09→1.67 bits) and report entropy (0.88→1.69 bits) increase monotonically with αfinding0.811
Evidence that improved introspection in focus→wellbeing arises from enriched internal state and report channels simultaneously
Focus concept: Spearman ρ=0.40, isotonic R²=0.12 in LLaMA-3.2-3B (n=400, p<10⁻⁵)finding0.796
Weakest but still significant pooled introspective coupling in primary model
Llama-3.3-70B corrected response scores 75/100 rather than 100 due to residual steering effects (Snell's law reference)finding0.775
Illustrative finding that ESR mitigates but does not fully eliminate steering influence
Wellbeing concept: Spearman ρ=0.68, isotonic R²=0.48 in LLaMA-3.2-3B (n=400, p<10⁻²⁶)finding0.775
Second-strongest pooled introspective coupling in primary model
Steering base models toward the Assistant Axis increases agreeableness traits (friendly, kind, helpful) and decreases extraversion in Gemma and openness in Llamafinding0.773
Characterizes the trait content of the Assistant Axis in pre-trained models