Card-counting heuristics suffice to outperform most LLMs tested.

TrackerAgent's second-place ranking calibrates the benchmark and highlights LLM shortcomings.

Source paper

extracted_from

Cattle Trade: A Multi-Agent Benchmark for LLM Bluffing, Bidding, and Bargaining

(2026) · Robert Müller · Clemens Müller

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Two heuristic code agents outperform most tested LLMsclaim0.817
author assertion that deterministic heuristics surpass many LLMs
Two heuristic code agents outperform most tested LLMs, and behavioural traces surface recurring LLM failure modes including overbidding, self-bidding, bankrupt TC initiation, and weak opponent-state adaptation.quote0.809
Abstract sentence summarising performance and failures.
Two heuristic code agents (TrackerAgent and SetRaceAgent) outperform most tested LLMs.claim0.786
Calibration that conditional logic can beat cost-efficient LLMs in this setting.
Li et al. 2024: larger LLMs outperform smaller ones at distinguishing self-related from non-self-related properties on self-awareness benchmarksfinding0.769
Prior finding showing scale-dependent self-awareness, consistent with the scale effect observed in the paper's Experiment 1
SetRaceAgent outperforms five of seven tested LLMsfinding0.760
SetRaceAgent ranked above DS-v3.2, GPT5.4-N, Haiku, G2.5-FL, and EconomyAgent.
LLMs can predict their own responses more accurately than external observers, implying privileged internal knowledgefinding0.756
Binder et al. finding cited as evidence that LLMs possess introspective capacity analogous to mindfulness
TrackerAgent outperforms six of seven tested LLMsfinding0.753
In the 98-game slice, TrackerAgent had a higher win rate or TrueSkill than all LLMs except Gemini 3 Flash.
Conditional logic already suffices where LLMs still fail, as code agents avoid systematic failuresclaim0.747
contrast between rule-based and LLM reasoning