Self-reflection

The ability of reasoning LLMs to review and revise previous reasoning steps during inference

Neighborhood — ranked by edge-count

paper

framework

ReflCtrl
about
The proposed framework for probing and steering self-reflection behavior in reasoning LLMs via representation engineering

method

Stepwise steering
about
Novel method that applies intervention only when the model begins a new thinking step (at the \n\n delimiter) rather than at every token
NoWait
about
Baseline method that reduces redundant reflection by directly suppressing corresponding reflection tokens

concept

No Reflection
related_to
Reflection level where the model is forced to output an answer immediately without revisiting reasoning.
Reflection direction
associated_with
A direction in the model's representation space that governs self-reflection behavior, computed as mean difference between reflection and non-reflection embeddings
Aha moment
associated_with
DeepSeek's description of models autonomously learning to self-reflect during training
Inference cost
associated_with
Computational expense proportional to number of generated tokens, targeted for reduction by ReflCtrl

hypothesis

Reasoning LLMs trigger reflection when their internal uncertainty is high
associated_with
Core hypothesis linking internal uncertainty to self-reflection behavior, tested via probing experiments

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Mirror of the selfconcept0.849
The phenomenon that objects with more living structure appear to us as more resembling our own eternal self.
Selfingconcept0.843
Process of reifying one's identity as an independent self; meditation practices aim to decrease selfing.
Situational Reflectionconcept0.837
The specific form of reflection studied, where a model reflects on reasoning generated by another source.
Self-reportconcept0.833
The model's verbal description of its internal state, which may be accurate or confabulated.
self-observationconcept0.829
The ability of a model to observe its own state, measured by Koan Battery; can be lifted by contemplative prompts.
mirror of the self testmethod0.829
A method introduced in Book 1 where observers compare their feeling of self with the life in a candidate thing; Alexander claims it correlates with observed life in thousands of centers.
Self-modelingconcept0.822
Ability of a model to predict its own outputs or behavior, sometimes distinguished from introspection.
Described Reflectionconcept0.820
Responses that name or describe the observing act without performing it; negatively correlated with high scores