concept
active
concept:self-reflectionSelf-reflection
The ability of reasoning LLMs to review and revise previous reasoning steps during inference
Neighborhood — ranked by edge-count
Papers (1)
paper
Frameworks (1)
framework
- ReflCtrlaboutThe proposed framework for probing and steering self-reflection behavior in reasoning LLMs via representation engineering
Methods (2)
method
- Stepwise steeringaboutNovel method that applies intervention only when the model begins a new thinking step (at the \n\n delimiter) rather than at every token
- NoWaitaboutBaseline method that reduces redundant reflection by directly suppressing corresponding reflection tokens
Concepts (4)
concept
- No Reflectionrelated_toReflection level where the model is forced to output an answer immediately without revisiting reasoning.
- Reflection directionassociated_withA direction in the model's representation space that governs self-reflection behavior, computed as mean difference between reflection and non-reflection embeddings
- Aha momentassociated_withDeepSeek's description of models autonomously learning to self-reflect during training
- Inference costassociated_withComputational expense proportional to number of generated tokens, targeted for reduction by ReflCtrl
Hypotheses (1)
hypothesis
- Core hypothesis linking internal uncertainty to self-reflection behavior, tested via probing experiments
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The phenomenon that objects with more living structure appear to us as more resembling our own eternal self.
- Process of reifying one's identity as an independent self; meditation practices aim to decrease selfing.
- The specific form of reflection studied, where a model reflects on reasoning generated by another source.
- The model's verbal description of its internal state, which may be accurate or confabulated.
- The ability of a model to observe its own state, measured by Koan Battery; can be lifted by contemplative prompts.
- A method introduced in Book 1 where observers compare their feeling of self with the life in a candidate thing; Alexander claims it correlates with observed life in thousands of centers.
- Ability of a model to predict its own outputs or behavior, sometimes distinguished from introspection.
- Responses that name or describe the observing act without performing it; negatively correlated with high scores