claim
active
claim:higher-reflection-frequency-correlates-with-lower-accuracy-partly-because-more-reflections-are-generated-on-difficult-questionsHigher reflection frequency correlates with lower accuracy partly because more reflections are generated on difficult questions
Author's interpretation of the negative correlation between reflection rate and accuracy observed in Fig. 5
Source paper
extracted_from(2025) · Ge Yan · Sun, Chung-En · Tsui-Wei · Weng
Neighborhood — ranked by edge-count
Hypotheses (1)
hypothesis
- Hypothesis explaining negative correlation between reflection rate and accuracy without implying reflection is harmful
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Per-category analysis showing reflection rate does not help within difficulty class
- Out-of-domain generalization showing deception features track general representational honesty
- Theoretical limitation identified by the authors distinguishing reflection from stylistic tasks.
- Question raised by the discrepancy between DAS IIA and linear probe accuracy in Case Study II
- Shows interpretability correlates with activation strength, most model effect comes from high activations
- Key asymmetry finding interpreted mechanistically by the authors.
- Reflection-inducing directions emerge more clearly in higher layers (ℓ>5) for both models and datasetsfinding0.766Empirical observation about which network layers encode reflection-relevant information.