claim
active
claim:task-difficulty-moderates-whether-cot-is-performative-or-genuine-easy-recall-questions-show-performative-cot-difficult-multihop-questions-show-genuine-reasoningTask difficulty moderates whether CoT is performative or genuine: easy recall questions show performative CoT, difficult multihop questions show genuine reasoning
Task difficulty as the key variable distinguishing the two modes of CoT identified in the paper
Source paper
extracted_from(2026) · Siddharth Boppana · Annabel Ma · Max Loeffler · Raphaël Sarfati +4
Neighborhood — ranked by edge-count
Papers (1)
paper
Findings (1)
finding
- Empirical finding contrasting difficult questions with easy ones, supporting genuine reasoning on hard tasks
Questions (1)
question
- Key question addressed by the task difficulty analysis comparing MMLU and GPQA-Diamond
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Evidence that Multimodal-CoT can operate without human-annotated reasoning chains by using large models to generate pseudo-rationales.
- High-level policy-relevant claim about the risks of advanced reasoning in LLMs
- Key comparative finding showing activation probes outperform text-level monitors for early answer detection
- Evidence that multimodal information accelerates convergence speed during training.
- Comparative finding establishing activation probing as superior to text-level monitoring for early belief detection
- Finding that explicit correctness framing partially aligns truth directions across task families.
- The central empirical claim of the paper, supported by activation probing evidence
- Key improvement in cross-task generalization enabled by explicit instruction framing.