finding
active
finding:sonnet-4-5-win-rate-35-7-n-14Sonnet 4.5 win rate=35.7% (n=14)
Sonnet's win rate in exploratory games
Source paper
extracted_from(2026) · Robert Müller · Clemens Müller
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Mid-field performance with larger uncertainty due to small sample.
- small-sample mixed play result
- Quantifies harness activation failure for weak-tier models vs. strong-tier models
- Full evolver-side SWE results showing comparable performance across Claude family tiers
- TrackerAgent won more than half its games
- Linked to Claude 3.5 Sonnet not exhibiting pro-animal-welfare preferences
- Gemini 3 Flash won nearly 3/4 of its games
- Demonstrates prompt effect crosses model tiers; smaller model with prompt beats larger without