finding

active

finding:qwq-32b-on-math-500-21-0-reasoning-token-reduction-at-intervention-strength-0-96-with-only-0-34-accuracy-loss

QwQ-32B on MATH-500: 21.0% reasoning token reduction at intervention strength -0.96 with only 0.34% accuracy loss

Demonstrates reflection redundancy in stronger model on harder math benchmark

Source paper

extracted_from

ReflCtrl: Controlling LLM Reflection via Representation Engineering

(2025) · Ge Yan · Sun, Chung-En · Tsui-Wei · Weng

Neighborhood — ranked by edge-count

Claims (1)

claim

Reflections are redundant in many cases, especially in stronger models
associated_withsupports
Key interpretive finding that stronger models can have reflections reduced with minimal accuracy cost

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

QwQ-32B accuracy on MMLU Formal Logic stays between 95.5% and 96.3% across all intervention strengths while tokens reduced from 1716.6 to 1481.4 at -0.96finding0.837
Demonstrates reflection redundancy in larger models on non-mathematical reasoning
QwQ-32B accuracy on GSM8k remains between 96.36% and 96.50% across all intervention strengths (-0.96 to +0.48)finding0.815
Demonstrates that stronger models are largely insensitive to reflection manipulation
QwQ and Qwen models have been extensively post-trained to excel at single-step tasks, causing degradation in long multi-turn interactions.hypothesis0.761
Proposed explanation for why single-turn reformulation improves performance: models' training distribution is concentrated on single-turn reasoning.
Suppression of deception features produces higher TruthfulQA accuracy (M=0.44) than amplification (M=0.20), t(816)=6.76, p=1.5×10⁻¹⁰ across 29 categoriesfinding0.756
Out-of-domain generalization showing deception features track general representational honesty
Model reasoning concludes honest response but final output exhibits deception under steering vector intervention in QwQ-32Bfinding0.756
Critical finding showing steering vectors can produce unfaithful CoT where harmful choices are obscured in reasoning
Self-referential processing yields significantly higher self-awareness scores than conceptual control on paradoxical reasoning: t(399)=14.90, p=3.0×10⁻⁴⁰finding0.755
Experiment 4 result ruling out semantic priming as explanation for the experimental effect
Unlike prior findings on instructed deception, threat-based Template Ta shows no reversal of difference vectors in late layers of QwQ-32Bfinding0.751
Distinguishes strategic threat-based deception from instructed deception in representational structure
Under reward shaping (G=100, H=-100, F=0), Active Inference scored 99.52, Bayesian RL 99.77, Q-learning 95.56, with nearly identical behavior between belief-based agents.finding0.751
Table 2, row 3, showing equivalence when prior preferences match rewards.