concept
active
concept:self-reflection

Self-reflection

The ability of reasoning LLMs to review and revise previous reasoning steps during inference

Neighborhood — ranked by edge-count

Frameworks (1)

framework
  • The proposed framework for probing and steering self-reflection behavior in reasoning LLMs via representation engineering

Methods (2)

method
  • Novel method that applies intervention only when the model begins a new thinking step (at the \n\n delimiter) rather than at every token
  • NoWait
    about
    Baseline method that reduces redundant reflection by directly suppressing corresponding reflection tokens

Concepts (4)

concept
  • No Reflection
    related_to
    Reflection level where the model is forced to output an answer immediately without revisiting reasoning.
  • Reflection direction
    associated_with
    A direction in the model's representation space that governs self-reflection behavior, computed as mean difference between reflection and non-reflection embeddings
  • Aha moment
    associated_with
    DeepSeek's description of models autonomously learning to self-reflect during training
  • Inference cost
    associated_with
    Computational expense proportional to number of generated tokens, targeted for reduction by ReflCtrl

Hypotheses (1)

hypothesis

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Mirror of the selfconcept0.849
    The phenomenon that objects with more living structure appear to us as more resembling our own eternal self.
  • Selfingconcept0.843
    Process of reifying one's identity as an independent self; meditation practices aim to decrease selfing.
  • The specific form of reflection studied, where a model reflects on reasoning generated by another source.
  • Self-reportconcept0.833
    The model's verbal description of its internal state, which may be accurate or confabulated.
  • self-observationconcept0.829
    The ability of a model to observe its own state, measured by Koan Battery; can be lifted by contemplative prompts.
  • A method introduced in Book 1 where observers compare their feeling of self with the life in a candidate thing; Alexander claims it correlates with observed life in thousands of centers.
  • Self-modelingconcept0.822
    Ability of a model to predict its own outputs or behavior, sometimes distinguished from introspection.
  • Responses that name or describe the observing act without performing it; negatively correlated with high scores