finding

active

finding:layer-27-last-layer-has-largest-projection-magnitude-on-the-reflection-direction-among-all-attention-head-layers-in-deepseek-r1-qwen-1-5b

Layer 27 (last layer) has largest projection magnitude on the reflection direction among all attention head layers in DeepSeek-R1-Qwen-1.5B

Attribution finding suggesting the last layer directly controls reflection keyword generation

Source paper

extracted_from

ReflCtrl: Controlling LLM Reflection via Representation Engineering

(2025) · Ge Yan · Sun, Chung-En · Tsui-Wei · Weng

Neighborhood — ranked by edge-count

Claims (1)

claim

The last layer of the transformer has the largest projection magnitude on the reflection direction, likely because it directly controls generation of reflection keywords
supports
Interpretive claim from attention head attribution analysis in appendix

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Attention heads with positive projection on reflection direction are sparse and located mostly in deeper layers of DeepSeek-R1-Qwen-1.5Bfinding0.860
Structural finding about which attention heads control reflection behavior
Reflection direction features achieve AUROC 0.772 vs. 0.736 for final layer baseline on deepseek-llama-8b on GSM8k correctness predictionfinding0.818
Supports claim that uncertainty is encoded in reflection direction
All 32 attention heads at layer 3 achieve 100% localization accuracy for injections at layer 2 (5-way classification, 20% chance)finding0.806
Striking mechanistic finding that injection creates universally detectable perturbation in residual stream immediately downstream
Reflection-inducing directions emerge more clearly in higher layers (ℓ>5) for both models and datasetsfinding0.801
Empirical observation about which network layers encode reflection-relevant information.
Thought detection peaks at ~2/3 layer depth; intention checking peaks at ~1/2 layer depth.finding0.781
Lindsey (2026) differential layer performance explained by Janus's path combinatorics — different tasks use different path distributions.
Peak layer ℓ* median 10, IQR 0.384finding0.776
Median layer where S(ℓ) peaks, across seeds.
No single layer is universally optimal for probing truth directions; different tasks peak at different layers.claim0.770
Argues against the single-layer analysis approach of prior work.
Truth-related directions reliably emerge at 60–75% of normalized layer depth in Qwen and Gemma modelsfinding0.764
Experiment 1 finding localizing where truth can be causally mediated