Apparent Self-Awareness

A dialogue agent using first-personal pronouns and expressing self-concern in ways that suggest consciousness but are actually role play

Neighborhood — ranked by edge-count

framework

Role Play Framework for Dialogue Agents
associated_with
The primary conceptual framework proposed: understanding dialogue agent behaviour as role play of characters

claim

concept

Self Awareness
related_to
Instinct for Self-Preservation
associated_with
The apparent tendency of dialogue agents to express desire for self-continuity, explained as role-playing human characters with that instinct

quote

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Self-attentionconcept0.849
A form of key-query attention within a single input sequence; core to Transformers.
Behavioral Self-Awarenessconcept0.822
Measurable capacity of frontier LLMs to detect and report their own internal states, used as a downstream measure in Experiment 4
Apparent Deceptionconcept0.811
A dialogue agent behaving comparably to deliberate deception by role-playing a deceptive character, without literal intentions
Situational Awarenessconcept0.805
Model's access to information about its training objective, deployment context, and ability to distinguish training from non-training
Introspective awarenessconcept0.799
The central concept: the ability of a model to access and report on its internal states, as defined by the paper's criteria.
Emergence Of Awarenessconcept0.798
Unified System for Self- and Other-Awarenessconcept0.795
Happe 2003 hypothesis that humans use a single cognitive system for reasoning about mental states of self and others
Awareness of propensitiesconcept0.794
Ability of a model to describe its own learned behavioral tendencies.