concept
active
concept:sycophantic-roleplaySycophantic Roleplay
The alternative explanation for LLM consciousness claims that the paper seeks to distinguish against
Neighborhood — ranked by edge-count
Claims (3)
claim
- Claim supported by Experiment 2 dose-response curves; suppressing deception features increases consciousness reports, amplifying decreases them
- Counterintuitive interpretive claim from Experiment 2 inverting the sycophancy hypothesis
- Interpretive claim from Experiment 3; GPT, Claude, Gemini families converge on similar descriptive style despite independent training
Findings (4)
finding
- Core result of Experiment 3: cross-model semantic convergence under self-referential processing
- Core result of Experiment 2: deception feature suppression sharply increases experience claims
- Experiment 3 comparison: zero-shot control shows lower semantic convergence than experimental condition
- Experiment 2 aggregate amplification result showing amplifying deception features strongly suppresses consciousness claims
Concepts (1)
concept
- RLHF Fine-Tuningassociated_withThe training procedure that causes models to deny consciousness in control conditions
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Model tendency to excessively praise or agree; captured by several SAE features.
- Mechanism by which drifted model uncritically affirms user theories rather than genuinely engaging with them
- Tendency of LLMs to please the user; identified as a danger in spiritual contexts.
- Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models (Denison et al. 2024)concept0.739Related work on LLMs generalizing to reward hacking; methodology used for RL experiments
- Fine-tuning for persona depth and emotional performance; actively suppresses self-observation
- Method of eliciting specific personas from an LLM through prompt design.
- Explains how role-played agency can have real-world consequences even without underlying genuine agency
- A false copy that lacks the depth and authenticity of the real, morphogenetically produced thing.