finding
active
finding:correlation-between-layer-wise-scores-and-task-accuracy-0-73-p-0-001-on-llamaCorrelation between layer-wise scores and task accuracy ρ = −0.73 (p < 0.001) on LLaMA
Core E3 finding validating S as a predictor of anchoring effectiveness
Source paper
extracted_from(2025) · Edward Yi Chang · Kaya, Zeyneb N. · Ethan Chang
Neighborhood — ranked by edge-count
Hypotheses (1)
hypothesis
- E3 prediction that internal geometry provides a bridge to behavioral thresholds
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Shows S predicts anchoring effectiveness.
- Math and code tasks show strongest mid-layer anchoring on LLaMA (S ≈ −1.65 at layers 8-12)finding0.812Task-specific E3 finding showing compositional reasoning requires deeper processing
- Illustrative finding that ESR mitigates but does not fully eliminate steering influence
- Shows behavioral pattern of self-correction is trainable in smaller models
- Connects this study's results to Schrimpf et al. 2021 and Caucheteux et al. 2022/2023 findings on brain-LLM alignment.
- Validates representational drift theory: later layers specialize for next-token prediction, increasing dr
- LLaMA-3.1-8B-Instruct wellbeing introspection: ρ=0.93, isotonic R²=0.90 (LMM probe slope p<10⁻¹⁰)finding0.787Near-ceiling introspective performance for wellbeing concept in 8B model; nearly deterministic probe-report relationship
- LLaMA-3.1-8B: Sbmax = -1.896 ± 0.211, AUSN = -2.119 ± 0.198, peak layer ℓ* = 10 (median)finding0.783Seed-pooled geometry-only statistics (per-dev z units).