finding
active
finding:on-swe-bench-harness-benefit-peaks-at-qwen3-235b-19-3-pp-while-weaker-qwen3-32b-gains-only-4-4-pp-and-stronger-opus-4-6-gains-only-2-6-pp

On SWE-bench, harness-benefit peaks at Qwen3-235B (19.3 pp), while weaker Qwen3-32B gains only 4.4 pp and stronger Opus 4.6 gains only 2.6 pp

Core finding demonstrating non-monotonic relationship between base capability and harness-benefit

Source paper

extracted_from
Harness Updating Is Not Harness Benefit: Disentangling Evolution Capabilities in Self-Evolving LLM Agents
(2026) · Minhua Lin · Juncheng Wu · Zijun Wang · Zhan Shi +13

Neighborhood — ranked by edge-count

Claims (1)

claim

Questions (1)

question

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.