finding
active
finding:opus-4-6-adherence-remains-stable-from-0-89-after-harness-loading-to-0-80-at-final-validation-drift-of-0-09

Opus 4.6 adherence remains stable from 0.89 after harness loading to 0.80 at final validation (drift of -0.09)

Strong-tier model maintains harness adherence over long-horizon trajectories

Source paper

extracted_from
Harness Updating Is Not Harness Benefit: Disentangling Evolution Capabilities in Self-Evolving LLM Agents
(2026) · Minhua Lin · Juncheng Wu · Zijun Wang · Zhan Shi +13

Neighborhood — ranked by edge-count

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.