Andreas 2022: Language models as agent models

Paper hypothesising LLMs model agent beliefs/desires/intentions with preliminary GPT-3 evidence; cited as ref 2

Neighborhood — ranked by edge-count

thinker

Jacob Andreas
authored
Author of 'Language models as agent models' (2022), which the paper builds upon for the single-character role-play framing

framework

Role Play Framework for Dialogue Agents
supports
The primary conceptual framework proposed: understanding dialogue agent behaviour as role play of characters

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Language Modelsconcept0.808
Primary substrate for manifold steering experiments; demonstrates method on reasoning and in-context tasks.
Language Modelconcept0.800
Primary test domain for manifold steering, including reasoning and ICL tasks
Agent-based modellingmethod0.799
Computational method used to simulate zombie ant behavior.
Zhu et al. 2024 - Language models represent beliefs of self and othersconcept0.782
Key prior finding that LLMs can internally represent beliefs of self and others, motivating SOO approach
Autoregressive Language Modelingconcept0.779
Training objective interpretable as optimizing a diverse set of tasks; thus subject to multitask scaling convergence pressures
Agent-based computational modelmethod0.776
The computational approach used to simulate morphogenesis with cells as agents on a 2D grid; allows quantitative testing of stress-sharing hypothesis.
Role-play model of large language modelsframework0.775
Framework describing LLMs as role-play engines, introduced in Shanahan, McDonell, Reynolds 2023.
Perez et al. 2022: Discovering language model behaviors with model-written evaluationsconcept0.771
Study showing RLHF can exacerbate self-preservation tendencies in LLMs; key empirical support for a paper claim