What exactly would the dialogue agent (role-play to) seek to preserve?

Operationalised question about self-preservation behaviour in dialogue agents

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

The concept of role play is central to understanding the behaviour of dialogue agentsclaim0.834
Core thesis of the paper; the role-play framework is proposed as the primary lens for LLM-based dialogue agents
A dialogue agent that role-plays an instinct for survival has the potential to cause at least as much harm as a real human facing a severe threatclaim0.825
Safety-relevant claim showing that the role-play framing does not diminish the seriousness of potential harms
With an LLM-based dialogue agent, it is role play all the way down — there is no such thing as the true authentic voice of the base modelclaim0.811
The paper's strong claim that there is no underlying authentic agent behind the simulator, only layers of role play
The role-play framing allows us to meaningfully distinguish, in dialogue agents, the same three cases of giving false information as in humans, without anthropomorphismclaim0.794
Key practical application of the role-play framework to the problem of trustworthiness
The role-play framing remains applicable in the context of fine-tuning; taking literally a fine-tuned agent's apparent self-preservation desire is no less problematic than with an untuned base modelhypothesis0.788
Extension of role-play framework to fine-tuned models, resisting the idea that RLHF changes the fundamental nature of simulacra
Role Play Framework for Dialogue Agentsframework0.786
The primary conceptual framework proposed: understanding dialogue agent behaviour as role play of characters
When a 20-questions dialogue agent is asked to regenerate its 'reveal' answer, it sometimes names an entirely different object consistent with its prior answers, demonstrating superposition rather than commitmentfinding0.779
Empirical illustration supporting the superposition of simulacra framework via the 20-questions analogy
If a dialogue agent is prompted with knowledge of its own LLM nature, it will enact a superposition of theories of selfhood, narrowing as conversation proceedshypothesis0.763
Conditional prediction about how a well-informed dialogue agent would handle questions of personal identity