finding
active
finding:the-between-to-within-class-variance-ratio-peaks-at-different-layers-for-different-tasks-confirming-no-single-layer-is-universally-optimalThe between-to-within-class variance ratio peaks at different layers for different tasks, confirming no single layer is universally optimal.
Supports the claim against single-layer probing approaches used in prior work.
Source paper
extracted_from(2026) · Angelos Poulis · Mark Crovella · Evimaria Terzi
Neighborhood — ranked by edge-count
Claims (1)
claim
- Argues against the single-layer analysis approach of prior work.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Shows that Burger et al.'s layer choice corresponds to a transitional phase, not a universal property.
- Core testable hypothesis of UCCT about the nature of performance transitions under anchoring
- Performance is best when skipping both the first and last six layers when applying interventionclaim0.739Empirical configuration finding from ablation study on layer selection
- Setting αk to the maximum gradient norm performs best among tested strategies on NYUv2 (Figure 6).finding0.738Sensitivity analysis for gradient normalization scaling factor.
- Recommended strategy for gradient normalization.
- Features smeared across layers cannot be fully disentangled by SAE on a single residual stream.
- Methodological critique of prior work that fixed a single layer for truth probing.