Agent Harness

The external non-parametric context and infrastructure (prompts, skills, memories, tools) through which an LLM is deployed for task execution

Neighborhood — ranked by edge-count

concept

Harness Self-Evolution
associated_with
The process of updating the external agent harness from execution evidence while keeping model weights fixed
Skills (Harness Artifact)
associated_with
Reusable procedural modules packaged as callable harness artifacts that can be invoked by agents during task solving
Tools (Harness Artifact)
associated_with
Harness components that expose external services and define how agents discover, invoke, and validate them
Memory (Harness Artifact)
associated_with
Harness component storing prior observations, facts, task outcomes, and strategies for later retrieval
Prompts (Harness Artifact)
associated_with
Natural-language harness artifacts that encode standing behavioral rules, task policies, and reasoning procedures

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Agentconcept0.766
Any autonomous system including living and non-living forms that embodies a perception-action cycle and tries to navigate and persist in an environment
Anchor Agent Setconcept0.761
Fixed set of representative task-solving agents (Opus 4.6, Sonnet 4.6, Qwen3-235B) used to compute harness-updating capability metrics
Harness-Benefit Capabilityconcept0.754
The capability of a task-solving agent to benefit from updated harnesses during task solving
Harness Activation Failureconcept0.753
A failure mode where weak-tier models fail to invoke relevant harness artifacts (e.g., skills) during task solving
which models produce useful harness updates?question0.725
First open question the paper sets out to answer through evolver-side analysis
Harness-Following Rate Measurementmethod0.724
LLM-judge pipeline measuring fraction of skill-loaded trajectories where agent follows loaded skill guidance, using Claude Sonnet 4.6 as judge
Harness-Updating Capabilityconcept0.723
The capability of an evolver model to produce useful persistent harness updates from execution evidence
Harness Adherence Failureconcept0.718
A failure mode where even when harness artifacts are loaded, weak-tier models fail to follow their guidance faithfully