finding

active

finding:cross-concept-steering-impulsivity-interest-r2-increases-from-0-55-4-to-0-72-4-r2-0-10-p-0-012-in-llama-3-2-3b

Cross-concept steering: impulsivity→interest R² increases from 0.55 (α=-4) to 0.72 (α=+4), ∆R²=0.10, p=0.012 in LLaMA-3.2-3B

Second significant cross-concept introspection improvement; marginal after BH correction (q≈0.066)

Source paper

extracted_from

Quantitative Introspection in Language Models: Tracking Emotive States Across Conversation

(2026) · Nicolas Martorell · Bianchi, Bruno

Neighborhood — ranked by edge-count

Claims (2)

claim

Basal introspective performance is not always maximal and some failure cases are solvable by representational intervention rather than reflecting complete absence of introspective capacity
supports
Supported by cross-concept steering finding that focus→wellbeing steering dramatically improves introspection
Cross-concept introspection improvement is pair-specific rather than revealing a single globally tunable introspection faculty
supports
Most of 4×4 cross-concept steering matrix shows no significant effect; two conditions survive

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Cross-concept steering: focus→wellbeing R² increases from 0.30 (α=-4) to 0.76 (α=+4), ∆R²=0.30, p<0.001 in LLaMA-3.2-3Bfinding0.895
Strongest cross-concept introspection improvement; survives BH correction (q≈0.011)
Impulsivity→interest: ρ increases from 0.70 (α=-4) to 0.83 (α=+4); R² from 0.46 to 0.69 in LLaMA-3.2-3Bfinding0.875
Scatter plot visualization showing strengthened probe-report relationship across alpha range
Impulsivity concept: Spearman ρ=0.51, isotonic R²=0.31 in LLaMA-3.2-3B (n=400, p<10⁻¹²)finding0.853
Third-strongest pooled introspective coupling in primary model
Wellbeing same-concept steering: LMM alpha slope=0.19, focus=0.40, interest=0.25, impulsivity=0.067 in LLaMA-3.2-3Bfinding0.848
Quantifies per-concept effect size of same-concept steering on self-report
Impulsivity→interest steering: probe entropy increases (LMM slope=0.024, p=2.30×10⁻⁴) but report entropy does not (p=0.11)finding0.843
Evidence of a bottleneck between richer internal variation and final report distribution in impulsivity→interest condition
LLaMA-3.2-1B impulsivity introspection: ρ=0.21, p<10⁻⁴ (significant but weaker than 3B ρ=0.52)finding0.829
Impulsivity shows significant introspection in 1B but declines in 8B; non-monotonic scaling
Impulsivity introspective fidelity decreases from turn 1 to turn 10: ∆R²=-0.28 in LLaMA-3.2-3Bfinding0.819
Opposite temporal trend to wellbeing/interest/focus; introspective fidelity weakens over conversation for impulsivity
Impulsivity probe: peak Cohen's d=3.60 (layer 13), p=3.58×10⁻¹³ in LLaMA-3.2-3Bfinding0.798
Strongest probe validation result; highest Cohen's d among the four concepts