claim
active
claim:cattle-trade-is-a-step-toward-evaluating-agentic-competence-under-more-realistic-conditions-of-strategic-interactionCATTLE TRADE is a step toward evaluating agentic competence under more realistic conditions of strategic interaction
positioning of the benchmark
Source paper
extracted_from(2026) · Robert Müller · Clemens Müller
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Distinctive integration of multiple pressures.
- Motivational statement for the benchmark design philosophy.
- Do LLM failures in CATTLE TRADE reflect genuinely hard strategic problems or errors that novice humans also avoid?question0.749Open question about benchmarking against human players to calibrate difficulty.
- Key reference documenting Meta's CICERO using deception in Diplomacy despite cooperative design intent
- explains divergence from static benchmarks
- Forward-looking claim about the potential of model introspection as an interpretability tool