Roleplay and Simulation as LLM Understanding Framework

Shanahan et al. argument that roleplay and simulation are useful lenses for understanding LLM behavior

Neighborhood — ranked by edge-count

Papers (1)

paper

The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models
cites

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Role Play Framework for Dialogue Agentsframework0.757
The primary conceptual framework proposed: understanding dialogue agent behaviour as role play of characters
If the conceptual framework we use to understand other humans is ill-suited to LLM-based dialogue agents, what alternative conceptual framework should we use?question0.752
The motivating question the paper sets out to answer by proposing role play and simulation metaphors
With an LLM-based dialogue agent, it is role play all the way down — there is no such thing as the true authentic voice of the base modelclaim0.732
The paper's strong claim that there is no underlying authentic agent behind the simulator, only layers of role play
Role-play model of large language modelsframework0.725
Framework describing LLMs as role-play engines, introduced in Shanahan, McDonell, Reynolds 2023.
The concept of role play is central to understanding the behaviour of dialogue agentsclaim0.724
Core thesis of the paper; the role-play framework is proposed as the primary lens for LLM-based dialogue agents
LLMs may be roleplaying their denials of experience rather than their affirmations, given that deception suppression increases consciousness reportsclaim0.717
Counterintuitive interpretive claim from Experiment 2: suppressing deception features increases affirmations, which is opposite to what sycophancy predicts
LLM Meta-Cognitionconcept0.717
The ability of LLMs to monitor and evaluate their own reasoning, closely related to reflection.
Linear Representation of Concepts in LLMsconcept0.709
The finding that interpretable concepts including character traits are encoded as linear directions in transformer residual streams