Jingru Jia

First author of LLM strategic reasoning via behavioral game theory, cited.

Authored

Introduces

Studies

Affiliations

Cited by

Authored papers (1)

Cattle Trade: A Multi-Agent Benchmark for LLM Bluffing, Bidding, and Bargaining2026
referenced-only
Across 242 games spanning 50–60 turns each, strategic coherence — operationalized as capital efficiency (η = score/gross outflow), resource discipline, and phase-adaptive bidding — predicts rank more strongly than any isolated subskill in CATTLE TRADE, a multi-agent benchmark built on a tabletop bluffing-and-auction game. Gemini 3 Flash leads all ten agents with TrueSkill µ = 30.1 ± 3.3 and 72.9% win rate, a capital efficiency of η = 1.77, and an ≈10× bid-aggressiveness ramp from early-game (0.26) to late-game (2.49); Gemini 2.5 Flash Lite, by contrast, bids at aggressiveness 2.52 throughout yet achieves η = 0.23 and finishes last. The benchmark introduces a behavioural analysis suite that logs every bid, trade-challenge (TC) offer, counteroffer, and card selection to profile overbid frequency, self-bidding rate, bluff calibration, and TC bargaining tightness (τ), in addition to TrueSkill competitive rating. Two deterministic heuristic code agents — TrackerAgent (µ = 28.7) and SetRaceAgent (µ = 27.3) — outperform six and five of seven tested LLMs respectively, with only G3-F clearly clearing both baselines; TrackerAgent does so through perfect card-counting and opponent-state tracking, a capability no LLM replicates despite receiving identical observable information. The paper argues this implies that cost-efficient LLMs fail not at individual subskills but at their reliable joint deployment under competitive pressure, and that benchmarks requiring the integration of auctions, hidden-offer deception, discrete resource constraints, and long-horizon portfolio management are necessary to expose failure modes invisible to static evaluations.

More papers — OpenAlex / S2

Co-authors (12)

Recent mentions (2)

papers-typed
muller-2026-cattle.md
papers-typed
muller-2026-cattle.md