method
active
method:trueskill-rating-systemTrueSkill rating system
Bayesian skill rating system used to rank agents from game outcomes.
Neighborhood — ranked by edge-count
Thinkers (1)
thinker
- Ralf HerbrichintroducesFirst author of TrueSkill Bayesian skill rating system, used in the benchmark.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Bayesian skill rating system used for competitive ranking in CATTLE TRADE
- Comparison to external leaderboards showing misalignment.
- The deep, unpretentious ease that arises in places where people can be themselves, supported by the subtle adaptation of the physical environment.
- DeepSeek v3.2 TrueSkill rating
- EconomyAgent TrueSkill rating
- Best code agent outperforming six of seven LLMs.
- Gemini 2.5 Flash Lite, lowest TrueSkill rating
- Second-best LLM, competitive with TrackerAgent.