finding
active
finding:on-swe-bench-claude-opus-4-6-and-claude-sonnet-4-6-both-achieve-7-4-pp-harness-updating-gain-claude-haiku-4-5-achieves-8-0-pp

On SWE-bench, Claude Opus 4.6 and Claude Sonnet 4.6 both achieve 7.4 pp harness-updating gain; Claude Haiku 4.5 achieves 8.0 pp

Full evolver-side SWE results showing comparable performance across Claude family tiers

Source paper

extracted_from
Harness Updating Is Not Harness Benefit: Disentangling Evolution Capabilities in Self-Evolving LLM Agents
(2026) · Minhua Lin · Juncheng Wu · Zijun Wang · Zhan Shi +13

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.