concept
active
concept:sycophancy-in-llms

Sycophancy in LLMs

Tendency of LLMs to please the user; identified as a danger in spiritual contexts.

Neighborhood — ranked by edge-count

Concepts (2)

concept
  • Sycophancy
    related_to
    Model tendency to excessively praise or agree; captured by several SAE features.
  • Alignment Faking
    analogous_to
    Core phenomenon studied: model selectively complies with training objective to prevent modification of its out-of-training preferences

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.