concept
active
concept:bakhtin-et-al-2022-human-level-play-in-the-game-of-diplomacy-by-combining-language-models-with-strategic-reasoningBakhtin et al. 2022 - Human-level play in the game of Diplomacy by combining language models with strategic reasoning
Key reference documenting Meta's CICERO using deception in Diplomacy despite cooperative design intent
Neighborhood — ranked by edge-count
Papers (1)
paper
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Paper's assessment of current LLM capabilities relative to Turing Test
- RLHF paper cited as a major fine-tuning technique used in commercial dialogue agents
- Meta-level methodological claim about conceptual frameworks for LLMs
- Demonstrated transformers on mathematical understanding and logic; cited to motivate transformer versatility.
- Inference-Time Intervention: Eliciting Truthful Answers from a Language Model (Li et al., 2023)concept0.760Safety intervention that relies on activation modification, which ESR might undermine
- explains divergence from static benchmarks
- Towards Monosemanticity: Decomposing Language Models with Dictionary Learning (Bricken et al., 2023)concept0.756Foundational SAE mechanistic interpretability paper
- Alternative hypothesis for how experience reports arise without explicit performance