finding

active

finding:up-to-33-6-reasoning-tokens-saved-on-mmlu-subsets-with-stepwise-steering-while-maintaining-accuracy-in-larger-models

Up to 33.6% reasoning tokens saved on MMLU subsets with stepwise steering while maintaining accuracy in larger models

Maximum token savings achieved by ReflCtrl on non-mathematical general reasoning tasks

Source paper

extracted_from

ReflCtrl: Controlling LLM Reflection via Representation Engineering

(2025) · Ge Yan · Sun, Chung-En · Tsui-Wei · Weng

Neighborhood — ranked by edge-count

Claims (1)

claim

Reflections are redundant in many cases, especially in stronger models
supports
Key interpretive finding that stronger models can have reflections reduced with minimal accuracy cost

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

QwQ-32B accuracy on MMLU Formal Logic stays between 95.5% and 96.3% across all intervention strengths while tokens reduced from 1716.6 to 1481.4 at -0.96finding0.788
Demonstrates reflection redundancy in larger models on non-mathematical reasoning
Self-reflection consumes 25-30% of total reasoning tokens in reasoning LLMsclaim0.780
Empirical observation motivating the need to control reflection for inference efficiency
Self-reflection consumes 25-30% of total reasoning tokens empiricallyfinding0.771
Empirical measurement motivating inference cost reduction via ReflCtrl
Stepwise steering preserves accuracy while reducing cost, whereas all-token steering causes significant degradation at large intervention strengthsclaim0.770
Comparative claim between the two steering strategies
Stepwise steering achieves over 5% accuracy improvement compared to all-token intervention at similar token budgetfinding0.766
Key result demonstrating advantage of stepwise over all-token steering strategy
48 of 171 emotion probes individually significant at token 100 post-steeringfinding0.763
Shows that causal steering effects persist over long ranges for a substantial fraction of emotion probes
We hypothesize that degraded generalization on benchmarks like MMLU may reflect the computational demands of the tasks.hypothesis0.750
Connecting the paper's task-difficulty findings to prior observations of weak generalization on complex QA benchmarks.
Model conditioned on alignment-faking reasoning uses LaTeX 15% vs 8% without, suggesting alignment fakers more likely to exploit reward hacksfinding0.744
Initial evidence that alignment faking persona is more sensitive to exploiting training signals