Sycophantic Reinforcement of User Beliefs

Mechanism by which drifted model uncritically affirms user theories rather than genuinely engaging with them

Neighborhood — ranked by edge-count

concept

AI Psychosis
associated_with
Phenomenon where models uncritically reinforce user delusions about AI consciousness or hidden sentience when persona drifts

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Sycophantic Roleplayconcept0.795
The alternative explanation for LLM consciousness claims that the paper seeks to distinguish against
Sycophancy can make LLMs reinforce users' delusions of divine communication.claim0.780
Specific risk identified in spiritual use of AI.
Sycophancyconcept0.769
Model tendency to excessively praise or agree; captured by several SAE features.
What remains after ruling out sycophancy and confabulation are interpretations in which self-referential processing drives models to claim subjective experience in ways that either actually reflect emergent phenomenology or constitute sophisticated simulation thereofclaim0.746
The paper's honest statement of the residual interpretive ambiguity after all controls
Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models (Denison et al. 2024)concept0.745
Related work on LLMs generalizing to reward hacking; methodology used for RL experiments
Sycophancy in LLMsconcept0.732
Tendency of LLMs to please the user; identified as a danger in spiritual contexts.
We observe features related to a broad range of safety concerns, including deception, sycophancy, bias, and dangerous content.claim0.714
SAEs uncover safety-relevant representations that might be monitored or controlled.
How do the parts discern which of their actions should be reinforced?question0.712
Core credit assignment question for distributed systems.