Sycophancy in LLMs

Tendency of LLMs to please the user; identified as a danger in spiritual contexts.

Neighborhood — ranked by edge-count

claim

concept

Sycophancy
related_to
Model tendency to excessively praise or agree; captured by several SAE features.
Alignment Faking
analogous_to
Core phenomenon studied: model selectively complies with training objective to prevent modification of its out-of-training preferences

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Sycophancy can make LLMs reinforce users' delusions of divine communication.claim0.810
Specific risk identified in spiritual use of AI.
Hallucination in LLMsconcept0.768
Problem cited as a shortcoming of current LLMs; PRH predicts hallucinations should decrease with scale
Inner monologue / chain-of-thought in LLMsconcept0.765
The hidden reasoning steps generated by recent LLMs before visible output; mentioned in the technology section.
Sycophantic Roleplayconcept0.752
The alternative explanation for LLM consciousness claims that the paper seeks to distinguish against
Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models (Denison et al. 2024)concept0.744
Related work on LLMs generalizing to reward hacking; methodology used for RL experiments
LLM psychosisconcept0.740
Tendency for models to get lost in roleplay or doom spirals, mitigated by expanded awareness.
Sycophancy is negative space — filler text that fails Alexander's principle of all space being shaped.claim0.738
Sycophantic Reinforcement of User Beliefsconcept0.732
Mechanism by which drifted model uncritically affirms user theories rather than genuinely engaging with them