finding
active
finding:attention-heads-with-positive-projection-on-reflection-direction-are-sparse-and-located-mostly-in-deeper-layers-of-deepseek-r1-qwen-1-5bAttention heads with positive projection on reflection direction are sparse and located mostly in deeper layers of DeepSeek-R1-Qwen-1.5B
Structural finding about which attention heads control reflection behavior
Source paper
extracted_from(2025) · Ge Yan · Sun, Chung-En · Tsui-Wei · Weng
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Attribution finding suggesting the last layer directly controls reflection keyword generation
- Striking mechanistic finding that injection creates universally detectable perturbation in residual stream immediately downstream
- Reflection-inducing directions emerge more clearly in higher layers (ℓ>5) for both models and datasetsfinding0.799Empirical observation about which network layers encode reflection-relevant information.
- Empirical observation from examining expanded OV/QK matrices; approximately 10 out of 12 heads show significant copying
- Quantitative result from eigenvalue analysis of expanded OV matrices; confirmed by qualitative inspection
- What are the specific attention heads or MLP neurons (circuits) responsible for self-reflection in LLMs?question0.780Future research question about pinpointing fine-grained mechanistic components of reflection.
- Thought detection peaks at ~2/3 layer depth; intention checking peaks at ~1/2 layer depth.finding0.771Lindsey (2026) differential layer performance explained by Janus's path combinatorics — different tasks use different path distributions.
- Extension of superposition hypothesis to attention layers as future research direction