finding
active
finding:asr-spikes-rapidly-in-all-tested-models-in-the-0-60-0-75-normalized-layer-range-before-decreasing-sharply-in-final-layersASR spikes rapidly in all tested models in the 0.60–0.75 normalized layer range before decreasing sharply in final layers
Core layer localization finding from Experiment 1
Source paper
extracted_from(2025) · Kevin Shengyang Yu · Vaidehi Bulusu · Oscar Yasunaga · Lau, Clayton +4
Neighborhood — ranked by edge-count
Claims (1)
claim
- Central interpretive claim of the paper
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Shows that explicit instructions delay the emergence of truth directions in arithmetic tasks.
- Characterizes the narrow operating window in which ESR can manifest
- Small Gemma model shows severe ASR degradation at higher cone dimensions
- Experiment 2 result showing large Gemma model supports high-dimensional truth cones
- Median layer where S(ℓ) peaks, across seeds.
- Supports claim that uncertainty is encoded in reflection direction
- Smaller models show non-monotonic and diminished ASR with increasing cone dimensionality
- Concrete numerical example showing detection and control are nearly identical at peak apparent accuracy