claim
active
claim:multi-turn-strategic-play-depends-on-capabilities-state-tracking-adaptive-resource-allocation-structured-output-reliability-that-static-benchmarks-do-not-measure-but-conversational-evaluations-partially-captureMulti-turn strategic play depends on capabilities (state tracking, adaptive resource allocation, structured-output reliability) that static benchmarks do not measure but conversational evaluations partially capture
explains divergence from static benchmarks
Source paper
extracted_from(2026) · Robert Müller · Clemens Müller
Neighborhood — ranked by edge-count
Concepts (1)
concept
- strategic reasoningassociated_withHigh-level cognitive ability to plan and act under uncertainty and adversarial conditions.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Broader methodological claim about the need for multi-agent, long-horizon benchmarks.
- summary claim linking measured traits to outcomes
- Motivational statement for the benchmark design philosophy.
- broader framing question for the benchmark
- Summary of sophisticated plant behaviours that support the inference of cognition.
- Three of four concepts show significant introspection at turn 1; rules out joint temporal drift as sole explanation
- central finding phrased as a load-bearing sentence
- Key reference documenting Meta's CICERO using deception in Diplomacy despite cooperative design intent