ReflCtrl

The proposed framework for probing and steering self-reflection behavior in reasoning LLMs via representation engineering

Neighborhood — ranked by edge-count

paper

method

Logistic regression correctness probe
uses
Logistic regression trained on GSM8k training set to predict answer correctness from projection features along reflection direction
Stepwise steering
uses
Novel method that applies intervention only when the model begins a new thinking step (at the \n\n delimiter) rather than at every token
NoWait
extends
Baseline method that reduces redundant reflection by directly suppressing corresponding reflection tokens
Reflection direction extraction
uses
Computes reflection direction as mean difference between MLP and attention output representations of first tokens in reflection vs. non-reflection steps
Keyword-based reflection step identification
uses
Method to identify reflection steps by searching for specific keywords (e.g., 'Let me think', 'Wait') within reasoning steps

concept

Self-reflection
about
The ability of reasoning LLMs to review and revise previous reasoning steps during inference

framework

Representation Engineering
implements
A class of methods that modify how models internally process representations; SOO fine-tuning fits within this framework
DeepSeek-R1
studies
Open-source reasoning LLM from DeepSeekAI trained with reinforcement learning to exhibit self-reflection

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Reinforcement learning (RL)concept0.735
Machine learning paradigm where agents learn to maximize cumulative reward through interaction.
RL algorithmsconcept0.725
The different reinforcement learning algorithms used across conditions, to ensure the alignment result is not algorithm-specific.
SkillRLframework0.699
Prior framework recursively expanding a skill library through reinforcement learning
IMTL-Lframework0.692
Prior loss-balancing method using learnable loss transformation; logarithm approach recovers this
Reinforcement Learningframework0.666
Alternative framework for agent behavior; based on reward maximization rather than free energy minimization.
Recursive Center Refinementmethod0.665
The iterative design process in which each center is refined relative to all others until a being-nature emerges; the method section 1 is titled 'Intensifying Shape'.
Causal Representation Learning (CRL)framework0.664
Schölkopf et al.'s framework combining representation learning with causal inference.
Aligned-MTLmethod0.660
Independent component alignment for multi-task learning.