quote
active
quote:two-heuristic-code-agents-outperform-most-tested-llms-and-behavioural-traces-surface-recurring-llm-failure-modes-including-overbidding-self-bidding-bankrupt-tc-initiation-and-weak-opponent-state-adaptationTwo heuristic code agents outperform most tested LLMs, and behavioural traces surface recurring LLM failure modes including overbidding, self-bidding, bankrupt TC initiation, and weak opponent-state adaptation.
Abstract sentence summarising performance and failures.
Source paper
extracted_from(2026) · Robert Müller · Clemens Müller
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- LLMs exhibit systematic errors that deterministic logic avoids.
- author assertion that deterministic heuristics surpass many LLMs
- Calibration that conditional logic can beat cost-efficient LLMs in this setting.
- discussion of potential confounds
- Conditional logic already suffices where LLMs still fail, as code agents avoid systematic failuresclaim0.821contrast between rule-based and LLM reasoning
- key claim about the benchmark's unique diagnostic value
- TrackerAgent's second-place ranking calibrates the benchmark and highlights LLM shortcomings.
- Do LLM failures in CATTLE TRADE reflect genuinely hard strategic problems or errors that novice humans also avoid?question0.787Open question about benchmarking against human players to calibrate difficulty.