claim
active
claim:a-linear-reflection-direction-exists-in-reasoning-llms-latent-representation-space-that-governs-self-reflection-behaviorA linear reflection direction exists in reasoning LLMs' latent representation space that governs self-reflection behavior
Core claim of ReflCtrl that a single direction captures and controls reflection
Source paper
extracted_from(2025) · Ge Yan · Sun, Chung-En · Tsui-Wei · Weng
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The underlying mechanism of self-reflection in reasoning LLMs is not yet well understoodquestion0.845Broad gap motivating the entire paper
- Open question motivating the entire paper; identified as not yet well understood
- Key limitation and open question about experimental scope.
- Interpretive claim about the locus of reflection in transformer architecture.
- Theoretical limitation identified by the authors distinguishing reflection from stylistic tasks.
- Central interpretive claim of the paper, supported by steering vector experiments.
- Central interpretive claim of the paper
- What are the specific attention heads or MLP neurons (circuits) responsible for self-reflection in LLMs?question0.803Future research question about pinpointing fine-grained mechanistic components of reflection.