thinker:ge-yanGe Yan
Authored papers (1)
ReflCtrl demonstrates that self-reflection in reasoning LLMs is governed by an identifiable direction in latent representation space and that suppressing this direction via stepwise steering can reduce reasoning token usage by up to 33.6% with negligible accuracy loss. The framework, ReflCtrl, extracts a reflection direction as the mean difference between MLP and attention output embeddings at reflection-initiating versus non-reflection steps, then injects or suppresses this direction only at reasoning step boundaries (tokens matching "\n\n"), avoiding the representation drift that degrades all-token steering. Across QwQ-32B, DeepSeek-R1 Llama 8B, and DeepSeek-R1 Qwen 14B evaluated on GSM8k, MATH-500, and three MMLU subsets, stronger models show near-total insensitivity to reflection suppression: QwQ-32B loses only 0.34% accuracy on MATH-500 while cutting tokens by 21.0%, and DS-Qwen-14B loses under 2.3% accuracy on MATH-500 at the maximum suppression setting. A logistic regression classifier trained on reflection-direction projections outperforms final-layer embeddings at predicting answer correctness—AUROC 0.850 versus 0.716 for DS-Qwen-14B—establishing that uncertainty information is encoded in the reflection direction. The paper argues this implies self-reflection is triggered by internal uncertainty perception and that for capable models a substantial fraction of reflective steps are computationally redundant, making uncertainty-aware dynamic steering a tractable target for further inference-cost reduction.
More papers — OpenAlex / S2
Affiliations (1)
- UC San Diego(institute)
Co-authors (5)
- Sun, Chung-En6 shared
- Tsui-Wei6 shared
- Weng6 shared
- Chung-En Sun3 shared
- Tsui-Wei (Lily) Weng3 shared
Recent mentions (1)
- papers-typedyan-2025-reflctrl-controlling.md