framework
active
framework:reflctrl

ReflCtrl

The proposed framework for probing and steering self-reflection behavior in reasoning LLMs via representation engineering

Neighborhood — ranked by edge-count

Methods (5)

method
  • Logistic regression trained on GSM8k training set to predict answer correctness from projection features along reflection direction
  • Novel method that applies intervention only when the model begins a new thinking step (at the \n\n delimiter) rather than at every token
  • NoWait
    extends
    Baseline method that reduces redundant reflection by directly suppressing corresponding reflection tokens
  • Computes reflection direction as mean difference between MLP and attention output representations of first tokens in reflection vs. non-reflection steps
  • Method to identify reflection steps by searching for specific keywords (e.g., 'Let me think', 'Wait') within reasoning steps

Concepts (1)

concept
  • The ability of reasoning LLMs to review and revise previous reasoning steps during inference

Frameworks (2)

framework
  • A class of methods that modify how models internally process representations; SOO fine-tuning fits within this framework
  • Open-source reasoning LLM from DeepSeekAI trained with reinforcement learning to exhibit self-reflection

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Machine learning paradigm where agents learn to maximize cumulative reward through interaction.
  • RL algorithmsconcept0.725
    The different reinforcement learning algorithms used across conditions, to ensure the alignment result is not algorithm-specific.
  • SkillRLframework0.699
    Prior framework recursively expanding a skill library through reinforcement learning
  • IMTL-Lframework0.692
    Prior loss-balancing method using learnable loss transformation; logarithm approach recovers this
  • Alternative framework for agent behavior; based on reward maximization rather than free energy minimization.
  • The iterative design process in which each center is refined relative to all others until a being-nature emerges; the method section 1 is titled 'Intensifying Shape'.
  • Schölkopf et al.'s framework combining representation learning with causal inference.
  • Aligned-MTLmethod0.660
    Independent component alignment for multi-task learning.