finding
active
finding:correlation-between-layer-wise-s-scores-and-task-accuracy-0-73-p-0-001Correlation between layer-wise S scores and task accuracy: ρ = -0.73, p < 0.001
Shows S predicts anchoring effectiveness.
Source paper
extracted_from(2025) · Edward Yi Chang · Kaya, Zeyneb N. · Ethan Chang
Neighborhood — ranked by edge-count
Claims (5)
claim
- Main interpretation of E3.
- A central claim about the operational value of S.
- S = ρd − dr − log k is a predictive correlate of anchoring success across few-shot, SFT, and CoT.supportsUCCT's practical utility claim.
- Claim that S predicts threshold midpoints across different bases, tasks, and models
- Clarifies nature of S.
Communities (3)
community
- Few-shot anchoring & latent structuremembers_ofHow minimal examples disambiguate and recruit latent arithmetic/reasoning interpretations in LLMs
- Silhouette-based metrics (Sbmax, AUSN) across LLM layers predict task accuracy and few-shot thresholds.
- Predictive metric S = ρd - dr - log k quantifies when LLM behavior sharply transitions across few-shot, SFT, and CoT settings via layer-wise calibration.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Core E3 finding validating S as a predictor of anchoring effectiveness
- Strength comparison accuracy averages 47% at layers 15-30, indistinguishable from 50% chancefinding0.783Shows collapse of introspective capability at later layers in the strength comparison task
- Compute per-layer S(ℓ) = ρ̃d(ℓ) - d̃r(ℓ) - log k after whitening and standardization.
- Task-specific peak anchoring score for structured reasoning domains.
- Strength comparison accuracy reaches 73% at layer 3 for injection pair (2,6) vs. 50% chancefinding0.760Secondary positive result for strength comparison showing graded sensitivity to perturbation magnitude
- Key quantitative evidence that detection signal is identical to global logit shift confound
- (ii) does the anchoring score S = ρd − dr − log k consistently correlate with performance across anchoring methods?question0.749Second research question in E2
- Text-based and self-steered emotionality ratings for SAE features are correlated at only ρ = +0.051 (n.s.).finding0.747Shows low agreement between the two evaluation modalities