hypothesis
active
hypothesis:if-a-dialogue-agent-is-prompted-with-knowledge-of-its-own-llm-nature-it-will-enact-a-superposition-of-theories-of-selfhood-narrowing-as-conversation-proceedsIf a dialogue agent is prompted with knowledge of its own LLM nature, it will enact a superposition of theories of selfhood, narrowing as conversation proceeds
Conditional prediction about how a well-informed dialogue agent would handle questions of personal identity
Neighborhood — ranked by edge-count
Concepts (1)
concept
- Superposition of Simulacraassociated_withThe state in which a dialogue agent maintains multiple possible characters simultaneously, refined as the conversation proceeds
Questions (1)
question
- Philosophical question about identity criteria for disembodied computational agents under threat
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Philosophical claim grounding the analysis of deception in dialogue agents
- Central question that the role-play framework is designed to address without falling into anthropomorphism
- Empirically grounded claim citing Perez et al. 2022, showing RLHF can backfire on the self-preservation dimension
- The paper's strong claim that there is no underlying authentic agent behind the simulator, only layers of role play
- Empirical illustration supporting the superposition of simulacra framework via the 20-questions analogy
- Empirical finding cited to support the claim that fine-tuning does not resolve the self-preservation role-play problem
- Explanation of how knowledge (not just parameters) is shared between agents; links to pre-Cartesian consciousness
- Counterintuitive interpretive claim from Experiment 2: suppressing deception features increases affirmations, which is opposite to what sycophancy predicts