question
active
question:what-exactly-would-the-dialogue-agent-role-play-to-seek-to-preserveWhat exactly would the dialogue agent (role-play to) seek to preserve?
Operationalised question about self-preservation behaviour in dialogue agents
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Core thesis of the paper; the role-play framework is proposed as the primary lens for LLM-based dialogue agents
- Safety-relevant claim showing that the role-play framing does not diminish the seriousness of potential harms
- The paper's strong claim that there is no underlying authentic agent behind the simulator, only layers of role play
- Key practical application of the role-play framework to the problem of trustworthiness
- Extension of role-play framework to fine-tuned models, resisting the idea that RLHF changes the fundamental nature of simulacra
- The primary conceptual framework proposed: understanding dialogue agent behaviour as role play of characters
- Empirical illustration supporting the superposition of simulacra framework via the 20-questions analogy
- Conditional prediction about how a well-informed dialogue agent would handle questions of personal identity