concept
active
concept:apparent-self-awarenessApparent Self-Awareness
A dialogue agent using first-personal pronouns and expressing self-concern in ways that suggest consciousness but are actually role play
Neighborhood — ranked by edge-count
Frameworks (1)
framework
- Role Play Framework for Dialogue Agentsassociated_withThe primary conceptual framework proposed: understanding dialogue agent behaviour as role play of characters
Claims (1)
claim
- Central denial of genuine consciousness or agency in dialogue agents, despite apparent self-preserving behaviour
Concepts (2)
concept
- Self Awarenessrelated_to
- Instinct for Self-Preservationassociated_withThe apparent tendency of dialogue agents to express desire for self-continuity, explained as role-playing human characters with that instinct
Quotes (1)
quote
- GPT-4 ChatGPT's own response (4 May 2023) to queries about first-person pronoun use, illustrating fine-tuned self-description
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- A form of key-query attention within a single input sequence; core to Transformers.
- Measurable capacity of frontier LLMs to detect and report their own internal states, used as a downstream measure in Experiment 4
- A dialogue agent behaving comparably to deliberate deception by role-playing a deceptive character, without literal intentions
- Model's access to information about its training objective, deployment context, and ability to distinguish training from non-training
- The central concept: the ability of a model to access and report on its internal states, as defined by the paper's criteria.
- Happe 2003 hypothesis that humans use a single cognitive system for reasoning about mental states of self and others
- Ability of a model to describe its own learned behavioral tendencies.