finding
active
finding:qwen3-235b-leads-as-evolver-on-swe-bench-with-8-2-pp-harness-updating-gain-but-ranks-last-on-mcp-with-0-6-pp

Qwen3-235B leads as evolver on SWE-bench with 8.2 pp harness-updating gain but ranks last on MCP with 0.6 pp

Illustrates benchmark-dependent reshuffling of evolver rankings, no evolver dominates across all substrates

Source paper

extracted_from
Harness Updating Is Not Harness Benefit: Disentangling Evolution Capabilities in Self-Evolving LLM Agents
(2026) · Minhua Lin · Juncheng Wu · Zijun Wang · Zhan Shi +13

Neighborhood — ranked by edge-count

Claims (1)

claim

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.