framework
active
framework:reflctrlReflCtrl
The proposed framework for probing and steering self-reflection behavior in reasoning LLMs via representation engineering
Neighborhood — ranked by edge-count
Papers (1)
paper
Methods (5)
method
- Logistic regression trained on GSM8k training set to predict answer correctness from projection features along reflection direction
- Novel method that applies intervention only when the model begins a new thinking step (at the \n\n delimiter) rather than at every token
- NoWaitextendsBaseline method that reduces redundant reflection by directly suppressing corresponding reflection tokens
- Computes reflection direction as mean difference between MLP and attention output representations of first tokens in reflection vs. non-reflection steps
- Method to identify reflection steps by searching for specific keywords (e.g., 'Let me think', 'Wait') within reasoning steps
Concepts (1)
concept
- Self-reflectionaboutThe ability of reasoning LLMs to review and revise previous reasoning steps during inference
Frameworks (2)
framework
- Representation EngineeringimplementsA class of methods that modify how models internally process representations; SOO fine-tuning fits within this framework
- DeepSeek-R1studiesOpen-source reasoning LLM from DeepSeekAI trained with reinforcement learning to exhibit self-reflection
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Machine learning paradigm where agents learn to maximize cumulative reward through interaction.
- The different reinforcement learning algorithms used across conditions, to ensure the alignment result is not algorithm-specific.
- Prior framework recursively expanding a skill library through reinforcement learning
- Prior loss-balancing method using learnable loss transformation; logarithm approach recovers this
- Alternative framework for agent behavior; based on reward maximization rather than free energy minimization.
- The iterative design process in which each center is refined relative to all others until a being-nature emerges; the method section 1 is titled 'Intensifying Shape'.
- Schölkopf et al.'s framework combining representation learning with causal inference.
- Independent component alignment for multi-task learning.