question
active
question:under-what-conditions-does-chain-of-thought-reflect-genuine-uncertainty-resolution-versus-a-learned-performanceunder what conditions does chain-of-thought reflect genuine uncertainty resolution versus a learned performance?
Key question addressed by the task difficulty analysis comparing MMLU and GPQA-Diamond
Source paper
extracted_from(2026) · Siddharth Boppana · Annabel Ma · Max Loeffler · Raphaël Sarfati +4
Neighborhood — ranked by edge-count
Claims (1)
claim
- Task difficulty as the key variable distinguishing the two modes of CoT identified in the paper
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Central research question motivating the paper
- A small number of high-quality human demonstrations of chain-of-thought reasoning could be used to improve and focus performance.hypothesis0.810Section 6 mentions high-quality human demos could improve natural language feedback.
- Medium through which eval awareness is often verbalized; target of intervention.
- The core interpretive question the paper narrows but cannot definitively answer
- Cited regarding possibility of encoding misaligned reasoning in benign chains-of-thought
- Figure 4 shows CoT improves over zero-shot, and ensembled CoT further boosts accuracy.
- CoT improves accuracy on HHH evals and makes the decision process legible.