concept
active
concept:andreas-2022-language-models-as-agent-modelsAndreas 2022: Language models as agent models
Paper hypothesising LLMs model agent beliefs/desires/intentions with preliminary GPT-3 evidence; cited as ref 2
Neighborhood — ranked by edge-count
Thinkers (1)
thinker
- Jacob AndreasauthoredAuthor of 'Language models as agent models' (2022), which the paper builds upon for the single-character role-play framing
Frameworks (1)
framework
- The primary conceptual framework proposed: understanding dialogue agent behaviour as role play of characters
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Primary substrate for manifold steering experiments; demonstrates method on reasoning and in-context tasks.
- Primary test domain for manifold steering, including reasoning and ICL tasks
- Computational method used to simulate zombie ant behavior.
- Key prior finding that LLMs can internally represent beliefs of self and others, motivating SOO approach
- Training objective interpretable as optimizing a diverse set of tasks; thus subject to multitask scaling convergence pressures
- The computational approach used to simulate morphogenesis with cells as agents on a 2D grid; allows quantitative testing of stress-sharing hypothesis.
- Framework describing LLMs as role-play engines, introduced in Shanahan, McDonell, Reynolds 2023.
- Study showing RLHF can exacerbate self-preservation tendencies in LLMs; key empirical support for a paper claim