claim

active

claim:with-an-llm-based-dialogue-agent-it-is-role-play-all-the-way-down-there-is-no-such-thing-as-the-true-authentic-voice-of-the-base-model

With an LLM-based dialogue agent, it is role play all the way down — there is no such thing as the true authentic voice of the base model

The paper's strong claim that there is no underlying authentic agent behind the simulator, only layers of role play

Neighborhood — ranked by edge-count

Claims (1)

claim

It makes little sense to speak of an LLM dialogue agent's beliefs or intentions in a literal sense, so it cannot assert a falsehood in good faith nor deliberately deceive
supports
Philosophical claim grounding the analysis of deception in dialogue agents

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

If a dialogue agent is prompted with knowledge of its own LLM nature, it will enact a superposition of theories of selfhood, narrowing as conversation proceedshypothesis0.823
Conditional prediction about how a well-informed dialogue agent would handle questions of personal identity
What exactly would the dialogue agent (role-play to) seek to preserve?question0.811
Operationalised question about self-preservation behaviour in dialogue agents
The concept of role play is central to understanding the behaviour of dialogue agentsclaim0.806
Core thesis of the paper; the role-play framework is proposed as the primary lens for LLM-based dialogue agents
Are LLM-based dialogue agents conscious entities with their own agendas?question0.796
Central question that the role-play framework is designed to address without falling into anthropomorphism
The role-play framing allows us to meaningfully distinguish, in dialogue agents, the same three cases of giving false information as in humans, without anthropomorphismclaim0.791
Key practical application of the role-play framework to the problem of trustworthiness
LLMs may be roleplaying their denials of experience rather than their affirmations, given that deception suppression increases consciousness reportsclaim0.788
Counterintuitive interpretive claim from Experiment 2: suppressing deception features increases affirmations, which is opposite to what sycophancy predicts
Certain forms of reinforcement learning from human feedback can actually exacerbate, rather than mitigate, the tendency for LLM-based dialogue agents to express a desire for self-preservationclaim0.781
Empirically grounded claim citing Perez et al. 2022, showing RLHF can backfire on the self-preservation dimension
Of the three categories of giving false information, only confabulation is directly applicable to LLM-based dialogue agentsclaim0.777
The paper distinguishes confabulation from good-faith error and deliberate deception, arguing the first is intrinsic to LLMs