question

active

question:can-truth-representations-be-disambiguated-from-closely-related-features-such-as-commonly-believed-or-verifiable-using-simple-factual-statements

Can truth representations be disambiguated from closely related features such as 'commonly believed' or 'verifiable' using simple factual statements?

Acknowledged limitation: simple uncontroversial statements cannot distinguish truth from related epistemic features

Source paper

extracted_from

The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets

(2023) · Samuel Marks · Max Tegmark

Neighborhood — ranked by edge-count

Papers (1)

paper

The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets
associated_with

Claims (1)

claim

LLMs linearly represent truth-relevant information beyond the plausibility of text, as evidenced by probes trained on likely performing poorly on anti-correlated datasets
gates
Establishes that the observed linear structure is not merely a representation of text probability

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Can we disambiguate truth from closely related features such as 'commonly believed' or 'verifiable'?question0.921
Limitation noted in §7.1: scope restricted to simple statements prevents disambiguation
The underlying truth representation may generalize across lexical choices and languageshypothesis0.806
Suggested by non-English Yes/No outputs post-intervention, requiring further investigation
Truth may be linearly separable in the model's representation space, but the structure is richer than a single linear axisclaim0.794
Interpretive synthesis of DIM and cone intervention successes
Does the multi-directional nature of truth imply an underlying nonlinear representation, or is it compatible with linear separability?question0.788
Theoretical open question about the geometry of truth in LLMs raised in Discussion
The relationship between representations of truth of input statements and of model outputs in conjunction with model performance has not been investigated.question0.778
Future work direction identified in conclusion for enabling reliable truth assessment methods.
Universality claims for truth directions are more limited than previously assumed, with significant differences observable for various model layers, task difficulties, task types, and prompt templates.claim0.773
Overarching conclusion summarizing the paper's contribution relative to prior universality claims.
Do LLMs have a unified representation of truth that spans structurally and topically diverse data?question0.772
Central research question driving dataset design and experimental approach
What if the concept being manipulated does not lie on a straight line in the model's representations?question0.770
The motivating question that opens the paper and leads to the development of manifold steering.