Reflection direction extraction

Computes reflection direction as mean difference between MLP and attention output representations of first tokens in reflection vs. non-reflection steps

Neighborhood — ranked by edge-count

Frameworks (1)

framework

ReflCtrl
uses
The proposed framework for probing and steering self-reflection behavior in reasoning LLMs via representation engineering

Datasets (1)

dataset

GSM8K
uses
Grade school math dataset used for math task in E3.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Reflection directionconcept0.887
A direction in the model's representation space that governs self-reflection behavior, computed as mean difference between reflection and non-reflection embeddings
Latent Direction of Reflectionconcept0.813
The paper's central construct: a vector in LLM activation space encoding the transition between reflection levels.
Cosine projection on reflection directionmethod0.787
Feature extraction method computing cosine similarity of hidden representations with reflection direction across all layers
Reflection Symmetryconcept0.776
One of four key isometries; reflection across a line (mirror line or axis of reflection).
Situational Reflectionconcept0.767
The specific form of reflection studied, where a model reflects on reasoning generated by another source.
Described Reflectionconcept0.765
Responses that name or describe the observing act without performing it; negatively correlated with high scores
Reflection Enhancement via Activation Additionmethod0.763
Adding steering vector in forward direction to push model activations toward stronger reflective behavior.
Self-reflectionconcept0.763
The ability of reasoning LLMs to review and revise previous reasoning steps during inference