finding
active
finding:haiku-4-5-achieves-the-largest-harness-benefit-on-skillsbench-15-1-pp-despite-mid-tier-base-capability-of-5-8

Haiku 4.5 achieves the largest harness-benefit on SkillsBench (15.1 pp) despite mid-tier base capability of 5.8%

Shows SB low-base regime is more variable than SWE; Haiku benefits far more than Qwen3-235B despite similar base rates

Source paper

extracted_from
Harness Updating Is Not Harness Benefit: Disentangling Evolution Capabilities in Self-Evolving LLM Agents
(2026) · Minhua Lin · Juncheng Wu · Zijun Wang · Zhan Shi +13

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.