finding
active
finding:trackeragent-win-rate-53-6-in-98-canonical-gamesTrackerAgent win rate 53.6% in 98 canonical games
TrackerAgent won over half of the combined-comp1 games.
Source paper
extracted_from(2026) · Robert Müller · Clemens Müller
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- TrackerAgent won more than half its games
- Best code agent outperforming six of seven LLMs.
- second-highest TrueSkill rating
- Gemini 3 Flash won nearly 3/4 of its games
- In the 98-game slice, TrackerAgent had a higher win rate or TrueSkill than all LLMs except Gemini 3 Flash.
- TrackerAgent uses card counting to achieve high rating, a capability no LLM replicates
- Poor performance against code agents.
- performance in mixed games against three code agents