finding
active
finding:g2-5-fl-repeatedly-initiates-tcs-after-depleting-money-through-overbiddingG2.5-FL repeatedly initiates TCs after depleting money through overbidding
failure to condition action choice on resource state
Source paper
extracted_from(2026) · Robert Müller · Clemens Müller
Neighborhood — ranked by edge-count
Claims (1)
claim
- LLMs exhibit systematic errors that deterministic logic avoids.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- highest overbid frequency observed
- Failure to adapt bidding to game phase.
- G2.5-FL initiates a trade challenge for a goose with zero money cards, offering 0-value blufffinding0.776In one trace, G2.5-FL depleted money through overbidding and launched a TC with no resources, failing to condition action on resource state.
- sophisticated bluff calibration
- key claim about the benchmark's unique diagnostic value
- highest self-bid rate among all agents
- Much higher cost per quartet due to waste.
- Abstract sentence summarising performance and failures.