finding
active
finding:reflection-direction-features-achieve-auroc-0-772-vs-0-736-for-final-layer-baseline-on-deepseek-llama-8b-on-gsm8k-correctness-predictionReflection direction features achieve AUROC 0.772 vs. 0.736 for final layer baseline on deepseek-llama-8b on GSM8k correctness prediction
Supports claim that uncertainty is encoded in reflection direction
Source paper
extracted_from(2025) · Ge Yan · Sun, Chung-En · Tsui-Wei · Weng
Neighborhood — ranked by edge-count
Claims (1)
claim
- Interpretive claim from probing experiment showing reflection direction features outperform baseline for uncertainty prediction
Hypotheses (1)
hypothesis
- Core hypothesis linking internal uncertainty to self-reflection behavior, tested via probing experiments
Questions (1)
question
- First central research question motivating ReflCtrl investigation
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Easy questions (acc > 80%) have average reflection rate of 25.8% for DeepSeek-R1 Llama 8b on GSM8kfinding0.834Baseline reflection rate for easy questions confirming difficulty-reflection correlation
- Baseline accuracy when reflection is suppressed.
- Attribution finding suggesting the last layer directly controls reflection keyword generation
- Triggered Reflection with 'Alternatively' achieves accuracy .684 on gsm8k_adv for Gemma3-4B-ITfinding0.804Highest single-instruction accuracy result in the paper.
- Only model showing marginal benefit from increased reflection, at substantial token cost
- Core empirical result validating the three-level reflection framework on code reasoning.
- Core empirical finding about layer-dependent truth direction emergence across task types.
- Reflection-inducing directions emerge more clearly in higher layers (ℓ>5) for both models and datasetsfinding0.783Empirical observation about which network layers encode reflection-relevant information.