concept
active
concept:reflection-direction

Reflection direction

A direction in the model's representation space that governs self-reflection behavior, computed as mean difference between reflection and non-reflection embeddings

Neighborhood — ranked by edge-count

Methods (1)

method

Concepts (4)

concept
  • The paper's central construct: a vector in LLM activation space encoding the transition between reflection levels.
  • Self-reflection
    associated_with
    The ability of reasoning LLMs to review and revise previous reasoning steps during inference
  • Substrate on which causal emergence was computed across agent lifetimes; aligned with learning success.
  • Internal uncertainty
    associated_with
    The model's internal representation of uncertainty hypothesized to trigger self-reflection

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Computes reflection direction as mean difference between MLP and attention output representations of first tokens in reflection vs. non-reflection steps
  • No Reflectionconcept0.823
    Reflection level where the model is forced to output an answer immediately without revisiting reasoning.
  • One of four key isometries; reflection across a line (mirror line or axis of reflection).
  • Truth Directionconcept0.819
    A hypothesized direction in LLM activation space that encodes the truth or falsehood of factual statements
  • The specific form of reflection studied, where a model reflects on reasoning generated by another source.
  • Responses that name or describe the observing act without performing it; negatively correlated with high scores
  • Reflection rateconcept0.790
    Ratio of reflection steps to total reasoning steps, used to quantify reflection behavior
  • Enacted Reflectionconcept0.785
    Responses that perform the observing act; contrasted with described reflection; scorer rewards enacted over described