claim

active

claim:the-last-layer-of-the-transformer-has-the-largest-projection-magnitude-on-the-reflection-direction-likely-because-it-directly-controls-generation-of-reflection-keywords

The last layer of the transformer has the largest projection magnitude on the reflection direction, likely because it directly controls generation of reflection keywords

Interpretive claim from attention head attribution analysis in appendix

Source paper

extracted_from

ReflCtrl: Controlling LLM Reflection via Representation Engineering

(2025) · Ge Yan · Sun, Chung-En · Tsui-Wei · Weng

Neighborhood — ranked by edge-count

Findings (1)

finding

Layer 27 (last layer) has largest projection magnitude on the reflection direction among all attention head layers in DeepSeek-R1-Qwen-1.5B
supports
Attribution finding suggesting the last layer directly controls reflection keyword generation

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Reflection-inducing directions emerge more clearly in higher layers (ℓ>5) for both models and datasetsfinding0.793
Empirical observation about which network layers encode reflection-relevant information.
Cosine projection on reflection directionmethod0.763
Feature extraction method computing cosine similarity of hidden representations with reflection direction across all layers
Different introspective tasks may preferentially use different path distributions in the transformer.claim0.761
Interpretive claim connecting exponential path combinatorics to Lindsey's layer-dependent findings.
The model converges to a more stable truth direction in middle-to-late layers, as evidenced by increasing cosine similarity between layer-wise probes.claim0.756
Supported by the geometric transition visible in cosine similarity heatmaps for F0-F3.
A linear reflection direction exists in reasoning LLMs' latent representation space that governs self-reflection behaviorclaim0.753
Core claim of ReflCtrl that a single direction captures and controls reflection
Transformers are recurrent through autoregression because K/V stream provides horizontal information flow across positions.claim0.751
Claim formalizing the Anima Labs idea that transformers are effectively recurrent due to K/V stream.
Reflective reasoning requires late-stage integration of semantic and reasoning signals, hence reflection-related directions emerge more clearly in higher network layers.claim0.751
Interpretive claim about the locus of reflection in transformer architecture.
No single layer is universally optimal for probing truth directions; different tasks peak at different layers.claim0.751
Argues against the single-layer analysis approach of prior work.