finding
active
finding:gpt-oss-120b-achieves-5-9-pp-harness-updating-gain-on-swe-bench-lowest-among-all-seven-evolvers

GPT-OSS-120B achieves 5.9 pp harness-updating gain on SWE-bench, lowest among all seven evolvers

Part of full evolver-side matrix demonstrating flat but variable harness-updating across models

Source paper

extracted_from
Harness Updating Is Not Harness Benefit: Disentangling Evolution Capabilities in Self-Evolving LLM Agents
(2026) · Minhua Lin · Juncheng Wu · Zijun Wang · Zhan Shi +13

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.