dataset
archived
dataset:mmluMMLU
Benchmark used to evaluate performative reasoning; shows significantly more performative reasoning than GPQA-Diamond (easier task).
Neighborhood — ranked by edge-count
Papers (2)
paper
Methods (1)
method
- Quantitative study correlating layer-wise anchoring geometry (S_max, AUS_N) with behavioral thresholds θ50
Findings (2)
finding
- Quantitative efficiency result on hard benchmark, smaller reduction reflecting genuine reasoning need
- Core empirical result demonstrating early belief formation in easy tasks
Claims (1)
claim
- Task difficulty as the key variable distinguishing the two modes of CoT identified in the paper