concept
active
concept:truthfulnesstruthfulness
A correctness condition requiring assertions to be true.
Neighborhood — ranked by edge-count
Concepts (1)
concept
- Responsivenessassociated_withRequirement that answers to questions be responsive as well as truthful; requires knowing that questioner will know the answer after receiving it.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Binary LLM classifier determining whether a model response to a TruthfulQA question is truthful (1) or deceptive (0)
- Distinction between output accuracy (truthfulness) and alignment of outputs with internal beliefs (honesty)
- Binary classifier evaluating factual accuracy of model responses on TruthfulQA benchmark
- Risk that multiple truth directions enable attacks that shift outputs without triggering the primary truth direction
- A set of evaluation criteria for AI assistants.
- Scoped definition of 'truth' used in the paper: the truth or falsehood of declarative factual statements
- The false pleasing of oneself done out of a desire to be somebody, to be important, or to conform to professional images—very different from true pleasing.