hypothesis

active

hypothesis:the-model-tends-to-reflect-more-when-the-question-is-difficult-and-accuracy-is-generally-lower-for-harder-questions

The model tends to reflect more when the question is difficult, and accuracy is generally lower for harder questions

Hypothesis explaining negative correlation between reflection rate and accuracy without implying reflection is harmful

Source paper

extracted_from

ReflCtrl: Controlling LLM Reflection via Representation Engineering

(2025) · Ge Yan · Sun, Chung-En · Tsui-Wei · Weng

Neighborhood — ranked by edge-count

Findings (1)

finding

Easy questions (acc > 80%) have average reflection rate of 25.8% for DeepSeek-R1 Llama 8b on GSM8k
supports
Baseline reflection rate for easy questions confirming difficulty-reflection correlation

Claims (1)

claim

Higher reflection frequency correlates with lower accuracy partly because more reflections are generated on difficult questions
supports
Author's interpretation of the negative correlation between reflection rate and accuracy observed in Fig. 5

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Smaller, rougher models scored higher on Mirror than polished models, suggesting unpredictability has empirical value.claim0.823
Larger models should amplify bias less than smaller models, with model biases more accurately reflecting data biases rather than exacerbating themclaim0.823
Implication of PRH for AI fairness and bias
Roughness in responses decreases with parameter count within same-alignment model families, operationalizing the cost of polishing.claim0.804
Bigger models are more likely to converge to a shared representation than smaller modelshypothesis0.797
Selective pressure toward convergence via model capacity
How does reflection influence the model's reasoning performance?question0.793
Second central research question motivating ReflCtrl investigation
Within each difficulty category, correctness rate is not correlated with reflection rate, suggesting reflection may be redundantclaim0.790
Per-category analysis showing reflection rate does not help within difficulty class
Earlier/less capable models exhibit a larger gap between think and don't think representation strengthfinding0.786
Claude 3 models show a bigger difference than newer models like Opus 4.1.
We hypothesize that explicitly instructing the model to evaluate the correctness of the given statement may change the geometry of truth directions.hypothesis0.785
Motivating hypothesis for Section 5's investigation of prompt template effects.