concept
active
concept:adversarial-manipulation-of-truthfulness

Adversarial Manipulation of Truthfulness

Risk that multiple truth directions enable attacks that shift outputs without triggering the primary truth direction

Neighborhood — ranked by edge-count

Concepts (1)

concept
  • LLM behavior of generating falsehoods; the multi-dimensional truth subspace raises new risks for subtle manipulation

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.