method
active
method:reflection-direction-extraction

Reflection direction extraction

Computes reflection direction as mean difference between MLP and attention output representations of first tokens in reflection vs. non-reflection steps

Neighborhood — ranked by edge-count

Frameworks (1)

framework
  • The proposed framework for probing and steering self-reflection behavior in reasoning LLMs via representation engineering

Datasets (1)

dataset
  • GSM8K
    uses
    Grade school math dataset used for math task in E3.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • A direction in the model's representation space that governs self-reflection behavior, computed as mean difference between reflection and non-reflection embeddings
  • The paper's central construct: a vector in LLM activation space encoding the transition between reflection levels.
  • Feature extraction method computing cosine similarity of hidden representations with reflection direction across all layers
  • One of four key isometries; reflection across a line (mirror line or axis of reflection).
  • The specific form of reflection studied, where a model reflects on reasoning generated by another source.
  • Responses that name or describe the observing act without performing it; negatively correlated with high scores
  • Adding steering vector in forward direction to push model activations toward stronger reflective behavior.
  • Self-reflectionconcept0.763
    The ability of reasoning LLMs to review and revise previous reasoning steps during inference