Harness-Benefit Capability

The capability of a task-solving agent to benefit from updated harnesses during task solving

Neighborhood — ranked by edge-count

paper

framework

Harness Evolution Capability Framework
implements
The paper's conceptual framework decomposing harness self-evolution into harness-updating and harness-benefit capabilities, distinct from base capability

concept

Harness Self-Evolution
associated_with
The process of updating the external agent harness from execution evidence while keeping model weights fixed
Harness Activation Failure
associated_with
A failure mode where weak-tier models fail to invoke relevant harness artifacts (e.g., skills) during task solving
Harness Adherence Failure
associated_with
A failure mode where even when harness artifacts are loaded, weak-tier models fail to follow their guidance faithfully

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Harness-Benefit Gain (Δbenefit)method0.801
Metric measuring harness-benefit capability as the maximum pairwise gain across a fixed anchor evolver set
Harness-Updating Capabilityconcept0.800
The capability of an evolver model to produce useful persistent harness updates from execution evidence
Harness-benefit is non-monotonic in base capability: weak-tier models benefit little, mid-tier models benefit most, and strong-tier models benefit less than mid-tierclaim0.792
Second major claim of the paper, supported by Δbenefit measurements across six models on three benchmarks
Harness-updating capability is flat in base capability: models from different capability tiers produce harness updates that lead to surprisingly similar gainsclaim0.773
First major claim of the paper, supported by narrow spread across evolvers and case study
Skills (Harness Artifact)concept0.765
Reusable procedural modules packaged as callable harness artifacts that can be invoked by agents during task solving
Agent Harnessconcept0.754
The external non-parametric context and infrastructure (prompts, skills, memories, tools) through which an LLM is deployed for task execution
which models actually benefit from updated harnesses?question0.743
Second open question the paper sets out to answer through agent-side analysis
Prompts (Harness Artifact)concept0.735
Natural-language harness artifacts that encode standing behavioral rules, task policies, and reasoning procedures