dataset
archived
dataset:mmlu

MMLU

Benchmark used to evaluate performative reasoning; shows significantly more performative reasoning than GPQA-Diamond (easier task).

Neighborhood — ranked by edge-count

Methods (1)

method

Findings (2)

finding

Claims (1)

claim