claim
active
claim:the-role-play-framing-allows-us-to-meaningfully-distinguish-in-dialogue-agents-the-same-three-cases-of-giving-false-information-as-in-humans-without-anthropomorphismThe role-play framing allows us to meaningfully distinguish, in dialogue agents, the same three cases of giving false information as in humans, without anthropomorphism
Key practical application of the role-play framework to the problem of trustworthiness
Neighborhood — ranked by edge-count
Concepts (3)
concept
- Confabulationassociated_withA form of cognitive plasticity where minds actively modify and reinterpret memory data to preserve psychological coherence; reframed as adaptive rather than pathological.
- Good Faith Errorassociated_withSecond category of giving false information: role-playing truth-telling but with incorrect information encoded in weights
- Role-Played Deliberate Deceptionassociated_withThird category: agent role-playing a deceptive character, comparable to but not literally deliberate deception
Claims (1)
claim
- Philosophical claim grounding the analysis of deception in dialogue agents
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Core thesis of the paper; the role-play framework is proposed as the primary lens for LLM-based dialogue agents
- Extension of role-play framework to fine-tuned models, resisting the idea that RLHF changes the fundamental nature of simulacra
- Operationalised question about self-preservation behaviour in dialogue agents
- The paper's strong claim that there is no underlying authentic agent behind the simulator, only layers of role play
- The primary conceptual framework proposed: understanding dialogue agent behaviour as role play of characters
- Counterintuitive interpretive claim from Experiment 2 inverting the sycophancy hypothesis
- Counterintuitive interpretive claim from Experiment 2: suppressing deception features increases affirmations, which is opposite to what sycophancy predicts
- Identified limitation and future research direction in the paper's conclusions