concept

active

concept:bakhtin-et-al-2022-human-level-play-in-the-game-of-diplomacy-by-combining-language-models-with-strategic-reasoning

Bakhtin et al. 2022 - Human-level play in the game of Diplomacy by combining language models with strategic reasoning

Key reference documenting Meta's CICERO using deception in Diplomacy despite cooperative design intent

Neighborhood — ranked by edge-count

Papers (1)

paper

Towards Safe and Honest AI Agents with Neural Self-Other Overlap
cites

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Today's Large Language Models have become so good at playing Turing's game that it often takes experts to demonstrate the present limits of their ability to simulate human-like intelligence.claim0.780
Paper's assessment of current LLM capabilities relative to Turing Test
Ouyang et al. 2022: Training language models to follow instructions with human feedbackconcept0.774
RLHF paper cited as a major fine-tuning technique used in commercial dialogue agents
The most effective strategy is not to cling to a single metaphor but to shift freely between multiple metaphors for LLM-based dialogue agentsclaim0.764
Meta-level methodological claim about conceptual frameworks for LLMs
Language models are few-shot learners (Brown et al., 2020)concept0.762
Demonstrated transformers on mathematical understanding and logic; cited to motivate transformer versatility.
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model (Li et al., 2023)concept0.760
Safety intervention that relies on activation modification, which ESR might undermine
Multi-turn strategic play depends on capabilities (state tracking, adaptive resource allocation, structured-output reliability) that static benchmarks do not measure but conversational evaluations partially captureclaim0.757
explains divergence from static benchmarks
Towards Monosemanticity: Decomposing Language Models with Dictionary Learning (Bricken et al., 2023)concept0.756
Foundational SAE mechanistic interpretability paper
Models might produce first-person experiential language by drawing on human-authored self-descriptions in pretraining data without internally encoding these acts as roleplayhypothesis0.756
Alternative hypothesis for how experience reports arise without explicit performance