concept
active
concept:text-probability-vs-truth-distinctionText Probability vs Truth Distinction
The distinction between a statement being likely to appear in training data and a statement being factually true
Neighborhood — ranked by edge-count
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Distinction between output accuracy (truthfulness) and alignment of outputs with internal beliefs (honesty)
- The underlying truth representation may generalize across lexical choices and languageshypothesis0.754Suggested by non-English Yes/No outputs post-intervention, requiring further investigation
- Measure of expected sensory input, core to linking value and surprise.
- Algorithmic framework for probabilistic inference in graphical models.
- Can we disambiguate truth from closely related features such as 'commonly believed' or 'verifiable'?question0.742Limitation noted in §7.1: scope restricted to simple statements prevents disambiguation
- Patching experiments localize truth representations to these specific hidden states in LLaMA-2 models
- Scoped definition of 'truth' used in the paper: the truth or falsehood of declarative factual statements
- Shows that truth representations are not reducible to text probability representations