framework
archived
framework:cattle-trade-benchmark

CATTLE TRADE benchmark

A multi-agent benchmark integrating auctions, hidden-offer trade challenges, bluffing, bargaining, and resource management over 50-60 turns with four players, evaluating LLMs and code agents.

Neighborhood — ranked by edge-count

Methods (10)

method
  • auction mode with iterative call rounds where all non-auctioneer players submit bids simultaneously, faithful to tabletop rules
  • Free-text memory buffer updated each turn via an additional model call, included in subsequent observations under 'YOUR NOTES'.
  • Agent personal buffer updated after own turn via an extra model call, fed back into observations.
  • auction mode with a single sealed bid per player
  • auction mode with sequential bidding
  • TrueSkill
    implements
    Bayesian skill rating system used for competitive ranking in CATTLE TRADE
  • Bayesian skill rating system used to rank agents from game outcomes.
  • Algorithm that finds the minimum-overpay combination of discrete money cards to meet a payment amount with no change given.
  • Agent configuration where scratchpad is maintained and recent game events are provided in observations.
  • Agents respond with JSON specifying exact card selections and amounts; includes multi-stage fallback for errors.

Concepts (7)

concept
  • auction
    about
    Competitive bidding mechanism in the game where players vie for animal cards.
  • Payments use fixed denominations; no change given, forcing overpayment and resource constraint management.
  • Deceptive strategy using 0-value money cards in face-down offers to induce opponent acceptance without revealing true offer value.
  • Original 3–5 player card game by Rüdiger Koltze (Ravensburger, 1985) involving auctions, hidden offers, and bluffing, which CATTLE TRADE adapts.
  • Bilateral bargaining with face-down money offers, enabling bluffing via 0-value cards and information asymmetry.
  • Game condition where players do not know the exact money values held by opponents, only counts.
  • Allocating discrete money cards and animal holdings over many turns to maximize final score.

Artifacts (3)

artifact

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.