finding

active

finding:correlation-between-layer-wise-s-scores-and-task-accuracy-0-73-p-0-001

Correlation between layer-wise S scores and task accuracy: ρ = -0.73, p < 0.001

Shows S predicts anchoring effectiveness.

Source paper

extracted_from

The Guanyin Protocol: A Framework for Immediately Establishing an Understanding of Both Causality and Compassion in LLM Systems Using Semantic Anchoring

(2025) · Edward Yi Chang · Kaya, Zeyneb N. · Ethan Chang

Neighborhood — ranked by edge-count

Claims (5)

claim

Peak anchoring Sbmax and normalized area AUSN correlate with per-item success and internal shot midpoints θ50, providing a geometry-to-behavior bridge.
supports
Main interpretation of E3.
The anchoring score S is a predictive correlate of when anchoring succeeds and why small prompt changes yield threshold-like shifts.
supports
A central claim about the operational value of S.
S = ρd − dr − log k is a predictive correlate of anchoring success across few-shot, SFT, and CoT.
supports
UCCT's practical utility claim.
S = ρd - dr - log k is a predictive correlate of when few-shot behavior flips
supports
Claim that S predicts threshold midpoints across different bases, tasks, and models
S is a predictive correlate calibrated on dev sets, not an absolute measure
supports
Clarifies nature of S.

Communities (3)

community

Few-shot anchoring & latent structure
members_of
How minimal examples disambiguate and recruit latent arithmetic/reasoning interpretations in LLMs
Layer-wise geometry predicting few-shot learning
members_of
Silhouette-based metrics (Sbmax, AUSN) across LLM layers predict task accuracy and few-shot thresholds.
Anchoring score S for few-shot learning transitions
members_of
Predictive metric S = ρd - dr - log k quantifies when LLM behavior sharply transitions across few-shot, SFT, and CoT settings via layer-wise calibration.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Correlation between layer-wise scores and task accuracy ρ = −0.73 (p < 0.001) on LLaMAfinding0.880
Core E3 finding validating S as a predictor of anchoring effectiveness
Strength comparison accuracy averages 47% at layers 15-30, indistinguishable from 50% chancefinding0.783
Shows collapse of introspective capability at later layers in the strength comparison task
layer-wise anchoring score S(ℓ) computationmethod0.774
Compute per-layer S(ℓ) = ρ̃d(ℓ) - d̃r(ℓ) - log k after whitening and standardization.
Math/code tasks S ≈ -1.65 at layers 8–12finding0.766
Task-specific peak anchoring score for structured reasoning domains.
Strength comparison accuracy reaches 73% at layer 3 for injection pair (2,6) vs. 50% chancefinding0.760
Secondary positive result for strength comparison showing graded sensitivity to perturbation magnitude
Correlation r=0.999 between detection-adjusted logit difference and control logit increase across all 40 layer-strength configurationsfinding0.754
Key quantitative evidence that detection signal is identical to global logit shift confound
(ii) does the anchoring score S = ρd − dr − log k consistently correlate with performance across anchoring methods?question0.749
Second research question in E2
Text-based and self-steered emotionality ratings for SAE features are correlated at only ρ = +0.051 (n.s.).finding0.747
Shows low agreement between the two evaluation modalities