finding

active

finding:correlation-between-layer-wise-scores-and-task-accuracy-0-73-p-0-001-on-llama

Correlation between layer-wise scores and task accuracy ρ = −0.73 (p < 0.001) on LLaMA

Core E3 finding validating S as a predictor of anchoring effectiveness

Source paper

extracted_from

The Guanyin Protocol: A Framework for Immediately Establishing an Understanding of Both Causality and Compassion in LLM Systems Using Semantic Anchoring

(2025) · Edward Yi Chang · Kaya, Zeyneb N. · Ethan Chang

Neighborhood — ranked by edge-count

Hypotheses (1)

hypothesis

Hypothesis: Peak alignment location S_max and normalized trajectory area AUS_N predict shot midpoints θ50
supports
E3 prediction that internal geometry provides a bridge to behavioral thresholds

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Correlation between layer-wise S scores and task accuracy: ρ = -0.73, p < 0.001finding0.880
Shows S predicts anchoring effectiveness.
Math and code tasks show strongest mid-layer anchoring on LLaMA (S ≈ −1.65 at layers 8-12)finding0.812
Task-specific E3 finding showing compositional reasoning requires deeper processing
Llama-3.3-70B corrected response scores 75/100 rather than 100 due to residual steering effects (Snell's law reference)finding0.803
Illustrative finding that ESR mitigates but does not fully eliminate steering influence
Fine-tuning Llama-3.1-8B on self-correction examples increases multi-attempt rate proportionally with training data ratiofinding0.801
Shows behavioral pattern of self-correction is trainable in smaller models
The case at approximately the 2/3 layer of LLaMA3.1-8B (Layer 24, satisfying Criteria 1 and 2) aligns with prior studies showing the 2/3 layer optimally predicts human brain activity.finding0.800
Connects this study's results to Schrimpf et al. 2021 and Caucheteux et al. 2022/2023 findings on brain-LLM alignment.
Systematic layer 20-28 degradation in S(ℓ) to S ≈ −2.40 by layer 27 on LLaMAfinding0.791
Validates representational drift theory: later layers specialize for next-token prediction, increasing dr
LLaMA-3.1-8B-Instruct wellbeing introspection: ρ=0.93, isotonic R²=0.90 (LMM probe slope p<10⁻¹⁰)finding0.787
Near-ceiling introspective performance for wellbeing concept in 8B model; nearly deterministic probe-report relationship
LLaMA-3.1-8B: Sbmax = -1.896 ± 0.211, AUSN = -2.119 ± 0.198, peak layer ℓ* = 10 (median)finding0.783
Seed-pooled geometry-only statistics (per-dev z units).