finding
active
finding:clear-accuracy-stratification-across-three-reflection-levels-on-cruxeval-o-adv-triggered-065-247-intrinsic-040-133-no-reflection-017-051-for-qwen2-5-3b-gemma3-4b-itClear accuracy stratification across three reflection levels on cruxeval_o_adv: Triggered (.065/.247) > Intrinsic (.040/.133) > No Reflection (.017/.051) for Qwen2.5-3B/Gemma3-4B-IT
Core empirical result validating the three-level reflection framework on code reasoning.
Source paper
extracted_from(2025) · Chang, Fu-Chieh · Lee, Yu-Ting · Wu, Pei-Yuan
Neighborhood — ranked by edge-count
Claims (1)
claim
- Central interpretive claim of the paper, supported by steering vector experiments.
Questions (1)
question
- Second key research question motivating the latent direction analysis.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Triggered Reflection with 'Alternatively' achieves accuracy .684 on gsm8k_adv for Gemma3-4B-ITfinding0.802Highest single-instruction accuracy result in the paper.
- Supports claim that uncertainty is encoded in reflection direction
- Baseline accuracy when reflection is suppressed.
- Validates robustness of alignment metric choice
- Empirical interpretation of which reference baseline yields more useful steering vectors.
- SOO fine-tuning did not collapse Gemma-2-27B self-other distinction needed for perspective-taking
- Out-of-domain generalization showing deception features track general representational honesty
- Thought detection peaks at ~2/3 layer depth; intention checking peaks at ~1/2 layer depth.finding0.755Lindsey (2026) differential layer performance explained by Janus's path combinatorics — different tasks use different path distributions.