Harness-Following Rate

The fraction of skill-loaded trajectories judged by an LLM judge as following the loaded skill's guidance

Neighborhood — ranked by edge-count

paper

method

Harness-Following Rate Measurement
implements
LLM-judge pipeline measuring fraction of skill-loaded trajectories where agent follows loaded skill guidance, using Claude Sonnet 4.6 as judge

concept

Harness Adherence Failure
associated_with
A failure mode where even when harness artifacts are loaded, weak-tier models fail to follow their guidance faithfully

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Agent Harnessconcept0.713
The external non-parametric context and infrastructure (prompts, skills, memories, tools) through which an LLM is deployed for task execution
Harness-Updating Capabilityconcept0.702
The capability of an evolver model to produce useful persistent harness updates from execution evidence
Harness Activation Failureconcept0.700
A failure mode where weak-tier models fail to invoke relevant harness artifacts (e.g., skills) during task solving
Prompts (Harness Artifact)concept0.695
Natural-language harness artifacts that encode standing behavioral rules, task policies, and reasoning procedures
Harness-Updating Gain (Δupdate)method0.690
Metric measuring harness-updating capability as the mean pairwise gain across an anchor agent set
Pass-When-Loaded Rateconcept0.687
The pass rate among a model's skill-loaded trajectories, measuring outcome conditioned on harness activation
Harness-Benefit Capabilityconcept0.684
The capability of a task-solving agent to benefit from updated harnesses during task solving
which models produce useful harness updates?question0.683
First open question the paper sets out to answer through evolver-side analysis