concept
active
concept:sycophancy-in-llmsSycophancy in LLMs
Tendency of LLMs to please the user; identified as a danger in spiritual contexts.
Neighborhood — ranked by edge-count
Claims (1)
claim
- AI used in spiritual contexts should be likened to a potent, mind-altering drug; it has potential to do harm as well as good.associated_withCautionary ethical stance.
Concepts (2)
concept
- Sycophancyrelated_toModel tendency to excessively praise or agree; captured by several SAE features.
- Alignment Fakinganalogous_toCore phenomenon studied: model selectively complies with training objective to prevent modification of its out-of-training preferences
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Specific risk identified in spiritual use of AI.
- Problem cited as a shortcoming of current LLMs; PRH predicts hallucinations should decrease with scale
- The hidden reasoning steps generated by recent LLMs before visible output; mentioned in the technology section.
- The alternative explanation for LLM consciousness claims that the paper seeks to distinguish against
- Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models (Denison et al. 2024)concept0.744Related work on LLMs generalizing to reward hacking; methodology used for RL experiments
- Tendency for models to get lost in roleplay or doom spirals, mitigated by expanded awareness.
- Mechanism by which drifted model uncritically affirms user theories rather than genuinely engaging with them