finding
active
finding:qwen3-235b-achieves-only-1-1-pp-harness-benefit-on-skillsbench-despite-4-7-base-pass-rate-near-qwen3-32b-s-0-0-baseline

Qwen3-235B achieves only 1.1 pp harness-benefit on SkillsBench despite 4.7% base pass rate, near Qwen3-32B's 0.0% baseline

Shows that SB low-base regime is variable; similar starting points can yield very different harness-benefit

Source paper

extracted_from
Harness Updating Is Not Harness Benefit: Disentangling Evolution Capabilities in Self-Evolving LLM Agents
(2026) · Minhua Lin · Juncheng Wu · Zijun Wang · Zhan Shi +13

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.