finding
active
finding:qwen3-5-9b-evolver-achieves-highest-harness-updating-gain-on-skillsbench-3-8-pp-exceeding-claude-opus-4-6-2-3-pp-and-qwen3-235b-1-5-pp

Qwen3.5-9B evolver achieves highest harness-updating gain on SkillsBench (3.8 pp), exceeding Claude Opus 4.6 (2.3 pp) and Qwen3-235B (1.5 pp)

Case demonstrating that model scale does not predict harness-updating quality

Source paper

extracted_from
Harness Updating Is Not Harness Benefit: Disentangling Evolution Capabilities in Self-Evolving LLM Agents
(2026) · Minhua Lin · Juncheng Wu · Zijun Wang · Zhan Shi +13

Neighborhood — ranked by edge-count

Claims (1)

claim

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.