finding
active
finding:layer-27-last-layer-has-largest-projection-magnitude-on-the-reflection-direction-among-all-attention-head-layers-in-deepseek-r1-qwen-1-5bLayer 27 (last layer) has largest projection magnitude on the reflection direction among all attention head layers in DeepSeek-R1-Qwen-1.5B
Attribution finding suggesting the last layer directly controls reflection keyword generation
Source paper
extracted_from(2025) · Ge Yan · Sun, Chung-En · Tsui-Wei · Weng
Neighborhood — ranked by edge-count
Claims (1)
claim
- Interpretive claim from attention head attribution analysis in appendix
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Structural finding about which attention heads control reflection behavior
- Supports claim that uncertainty is encoded in reflection direction
- Striking mechanistic finding that injection creates universally detectable perturbation in residual stream immediately downstream
- Reflection-inducing directions emerge more clearly in higher layers (ℓ>5) for both models and datasetsfinding0.801Empirical observation about which network layers encode reflection-relevant information.
- Thought detection peaks at ~2/3 layer depth; intention checking peaks at ~1/2 layer depth.finding0.781Lindsey (2026) differential layer performance explained by Janus's path combinatorics — different tasks use different path distributions.
- Median layer where S(ℓ) peaks, across seeds.
- Argues against the single-layer analysis approach of prior work.
- Truth-related directions reliably emerge at 60–75% of normalized layer depth in Qwen and Gemma modelsfinding0.764Experiment 1 finding localizing where truth can be causally mediated