finding
active
finding:no-reflection-with-answer-achieves-accuracy-037-on-gsm8k-adv-for-qwen2-5-3bNo Reflection with 'Answer' achieves accuracy .037 on gsm8k_adv for Qwen2.5-3B
Baseline accuracy when reflection is suppressed.
Source paper
extracted_from(2025) · Chang, Fu-Chieh · Lee, Yu-Ting · Wu, Pei-Yuan
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Triggered Reflection with 'Alternatively' achieves accuracy .684 on gsm8k_adv for Gemma3-4B-ITfinding0.852Highest single-instruction accuracy result in the paper.
- Supports claim that uncertainty is encoded in reflection direction
- Demonstrates that stronger models are largely insensitive to reflection manipulation
- Easy questions (acc > 80%) have average reflection rate of 25.8% for DeepSeek-R1 Llama 8b on GSM8kfinding0.795Baseline reflection rate for easy questions confirming difficulty-reflection correlation
- Core empirical result validating the three-level reflection framework on code reasoning.
- High cosine similarity for Gemma3 steering vectors suggests strong linear reflection structure.
- Only model showing marginal benefit from increased reflection, at substantial token cost
- Author's interpretation of the negative correlation between reflection rate and accuracy observed in Fig. 5