finding

active

finding:g2-5-fl-initiates-a-trade-challenge-for-a-goose-with-zero-money-cards-offering-0-value-bluff

G2.5-FL initiates a trade challenge for a goose with zero money cards, offering 0-value bluff

In one trace, G2.5-FL depleted money through overbidding and launched a TC with no resources, failing to condition action on resource state.

Source paper

extracted_from

Cattle Trade: A Multi-Agent Benchmark for LLM Bluffing, Bidding, and Bargaining

(2026) · Robert Müller · Clemens Müller

Neighborhood — ranked by edge-count

Claims (1)

claim

The benchmark’s diagnostic value lies in identifying why a model loses, not just that it loses
supports
argues for fine-grained behavioral analysis over aggregate rankings

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

G3-F conditions TC offers on opponent wealth and game context, e.g., 0-value bluffs against bankrupt opponentsfinding0.803
sophisticated bluff calibration
G2.5-FL repeatedly initiates TCs after depleting money through overbiddingfinding0.776
failure to condition action choice on resource state
G2.5-FL bid aggressiveness 2.07 early and 2.08 late (no adaptation)finding0.755
Failure to adapt bidding to game phase.
G3.1-FL generates ~14,800 completion tokens per gamefinding0.722
Very efficient token usage with strong play.
G3.1-FL wins 50.0% of 28 mixed gamesfinding0.713
Half the games won against code agents.
G2.5-FL self-bid rate=78.5%finding0.711
highest self-bid rate among all agents
G3-F wins 67.9% of 28 mixed games (vs three code agents)finding0.711
Robust performance against algorithmic baselines.
G3-F bid aggressiveness ramp 0.26 (early) → 2.49 (late), ≈10× escalationfinding0.705
Strong phase-adaptive bidding.