method
active
method:reflection-direction-extractionReflection direction extraction
Computes reflection direction as mean difference between MLP and attention output representations of first tokens in reflection vs. non-reflection steps
Neighborhood — ranked by edge-count
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- A direction in the model's representation space that governs self-reflection behavior, computed as mean difference between reflection and non-reflection embeddings
- The paper's central construct: a vector in LLM activation space encoding the transition between reflection levels.
- Feature extraction method computing cosine similarity of hidden representations with reflection direction across all layers
- One of four key isometries; reflection across a line (mirror line or axis of reflection).
- The specific form of reflection studied, where a model reflects on reasoning generated by another source.
- Responses that name or describe the observing act without performing it; negatively correlated with high scores
- Adding steering vector in forward direction to push model activations toward stronger reflective behavior.
- The ability of reasoning LLMs to review and revise previous reasoning steps during inference