finding
active
finding:token-usage-varies-roughly-20-across-models-from-14-800-g3-1-fl-to-275-000-g3-f-per-gameToken usage varies roughly 20× across models, from ~14,800 (G3.1-FL) to ~275,000 (G3-F) per game
Reasoning verbosity does not predict strategic strength: both top and weak models span a wide range of token usage.
Source paper
extracted_from(2026) · Robert Müller · Clemens Müller
Neighborhood — ranked by edge-count
Claims (1)
claim
- G3-F uses 275k tokens per game while G3.1-FL uses 14.8k, yet both rank top; token volume alone does not predict strategic quality.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Very efficient token usage with strong play.
- verbose reasoning not required for strong play
- Gemini 3 Flash won nearly 3/4 of its games
- Basic SAE performance metrics.
- performance in mixed games against three code agents
- Second-best LLM, competitive with TrackerAgent.
- Training scale for second stage.
- Training details for first stage.