finding

active

finding:g2-5-fl-repeatedly-initiates-tcs-after-depleting-money-through-overbidding

G2.5-FL repeatedly initiates TCs after depleting money through overbidding

failure to condition action choice on resource state

Source paper

extracted_from

Cattle Trade: A Multi-Agent Benchmark for LLM Bluffing, Bidding, and Bargaining

(2026) · Robert Müller · Clemens Müller

Neighborhood — ranked by edge-count

Claims (1)

claim

Behavioural traces surface recurring LLM failure modes including overbidding, self-bidding, bankrupt TC initiation, and weak opponent-state adaptation that never appear in code agents.
supports
LLMs exhibit systematic errors that deterministic logic avoids.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

G2.5-FL overbid rate=1.20%, highest among all agentsfinding0.791
highest overbid frequency observed
G2.5-FL bid aggressiveness 2.07 early and 2.08 late (no adaptation)finding0.789
Failure to adapt bidding to game phase.
G2.5-FL initiates a trade challenge for a goose with zero money cards, offering 0-value blufffinding0.776
In one trace, G2.5-FL depleted money through overbidding and launched a TC with no resources, failing to condition action on resource state.
G3-F conditions TC offers on opponent wealth and game context, e.g., 0-value bluffs against bankrupt opponentsfinding0.772
sophisticated bluff calibration
Overbid frequency, self-bidding rate, bankrupt-initiation patterns, and context-dependent offer calibration are failure modes invisible to both static evaluations and aggregate rankings like Eloclaim0.747
key claim about the benchmark's unique diagnostic value
G2.5-FL self-bid rate=78.5%finding0.730
highest self-bid rate among all agents
G2.5-FL cost per quartet 1,193 coinsfinding0.728
Much higher cost per quartet due to waste.
Two heuristic code agents outperform most tested LLMs, and behavioural traces surface recurring LLM failure modes including overbidding, self-bidding, bankrupt TC initiation, and weak opponent-state adaptation.quote0.725
Abstract sentence summarising performance and failures.