Harness-Updating Gain (Δupdate)

Metric measuring harness-updating capability as the mean pairwise gain across an anchor agent set

Neighborhood — ranked by edge-count

paper

concept

Anchor Agent Set
associated_with
Fixed set of representative task-solving agents (Opus 4.6, Sonnet 4.6, Qwen3-235B) used to compute harness-updating capability metrics

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Harness-Benefit Gain (Δbenefit)method0.847
Metric measuring harness-benefit capability as the maximum pairwise gain across a fixed anchor evolver set
Harness-Updating Capabilityconcept0.820
The capability of an evolver model to produce useful persistent harness updates from execution evidence
harness-updating is flat in base capability: models across capability tiers produce updates that yield similar gains, and even the Qwen3.5-9B evolver induces gains comparable to Claude Opus 4.6quote0.779
Verbatim summary of first major finding from conclusion
Harness-updating capability is flat in base capability: models from different capability tiers produce harness updates that lead to surprisingly similar gainsclaim0.774
First major claim of the paper, supported by narrow spread across evolvers and case study
Harness-updating gain spread is at most 3.1 percentage points across all evolvers on any single benchmarkfinding0.765
Core finding that harness-updating capability does not scale with model base capability
which models produce useful harness updates?question0.752
First open question the paper sets out to answer through evolver-side analysis
which models actually benefit from updated harnesses?question0.744
Second open question the paper sets out to answer through agent-side analysis
Harness-Benefit Capabilityconcept0.718
The capability of a task-solving agent to benefit from updated harnesses during task solving