Role play with large language models

ByM. Shanahan·Kyle McDonell·Laria ReynoldsEleutherAI

DOI 10.1038/s41586-023-06647-8 arXiv 2305.16367

Bing Chat Role Play Framework for Dialogue Agents ChatGPT Simulacra in Superposition Framework Eliza Effect Embodied Language Acquisition Google Bard Gopher GPT-2 GPT-3 GPT-4 Jailbreaking LaMDA Llama 2+2 more

Frameworks (2)

Role Play Framework for Dialogue Agents
The primary conceptual framework proposed: understanding dialogue agent behaviour as role play of characters
Simulacra in Superposition Framework
The more nuanced second metaphor: LLM as simulator maintaining a superposition of possible simulacra across a multiverse of characters

Original abstract (expand)

By casting large-language-model-based dialogue-agent behaviour in terms of role play, it is possible to describe dialogue-agent behaviour such as (apparent) deception and (apparent) self-awareness without misleadingly ascribing human characteristics to the models. As dialogue agents become increasingly human-like in their performance, we must develop effective ways to describe their behaviour in high-level terms without falling into the trap of anthropomorphism. Here we foreground the concept of role play. Casting dialogue-agent behaviour in terms of role play allows us to draw on familiar folk psychological terms, without ascribing human characteristics to language models that they in fact lack. Two important cases of dialogue-agent behaviour are addressed this way, namely, (apparent) deception and (apparent) self-awareness.

Similar preprints — Semantic Scholar

Cited by (2)

The Xeno Sutra: Can Meaning and Value be Ascribed to an AI-Generated "Sacred" Text?
A 12-verse AI-generated Buddhist "sutra" produced in a 13,700-word, 29-turn conversation with OpenAI's ChatGPT o3 in April 2025 carries non-trivial philosophical meaning despite its mechanistic origin
Consciousness in Artificial Intelligence: Insights from the Science of Consciousness
No current AI system is a strong candidate for phenomenal consciousness, yet there are no obvious technical barriers to building one — this is the central finding of Butlin et al. (2023), a systematic