Ge Yan

orcid 0009-0009-4664-688X openalex A5103821751 name_hash 48951b14d00b57d55e521e25…

Authored

Introduces

Studies

Affiliations

Cited by

Authored papers (1)

ReflCtrl: Controlling LLM Reflection via Representation Engineering2025
ReflCtrl demonstrates that self-reflection in reasoning LLMs is governed by an identifiable direction in latent representation space and that suppressing this direction via stepwise steering can reduce reasoning token usage by up to 33.6% with negligible accuracy loss. The framework, ReflCtrl, extracts a reflection direction as the mean difference between MLP and attention output embeddings at reflection-initiating versus non-reflection steps, then injects or suppresses this direction only at reasoning step boundaries (tokens matching "\n\n"), avoiding the representation drift that degrades all-token steering. Across QwQ-32B, DeepSeek-R1 Llama 8B, and DeepSeek-R1 Qwen 14B evaluated on GSM8k, MATH-500, and three MMLU subsets, stronger models show near-total insensitivity to reflection suppression: QwQ-32B loses only 0.34% accuracy on MATH-500 while cutting tokens by 21.0%, and DS-Qwen-14B loses under 2.3% accuracy on MATH-500 at the maximum suppression setting. A logistic regression classifier trained on reflection-direction projections outperforms final-layer embeddings at predicting answer correctness—AUROC 0.850 versus 0.716 for DS-Qwen-14B—establishing that uncertainty information is encoded in the reflection direction. The paper argues this implies self-reflection is triggered by internal uncertainty perception and that for capable models a substantial fraction of reflective steps are computationally redundant, making uncertainty-aware dynamic steering a tractable target for further inference-cost reduction.

More papers — OpenAlex / S2

Affiliations (1)

UC San Diego(institute)

Co-authors (5)

Sun, Chung-En6 shared
Tsui-Wei6 shared
Weng6 shared
Chung-En Sun3 shared
Tsui-Wei (Lily) Weng3 shared

Recent mentions (1)

papers-typed
yan-2025-reflctrl-controlling.md