concept
active
concept:truth-direction-in-llm-latent-spaceTruth Direction in LLM Latent Space
A specific direction in an LLM's residual stream that encodes the truth or falsehood of factual statements
Neighborhood — ranked by edge-count
Papers (1)
paper
Concepts (3)
concept
- Truth direction in LLMsrelated_toLinear direction in LLM activations associated with truthfulness, identified by Burns et al. 2022 and Azaria & Mitchell 2023
- Linear Representation of Featuresassociated_withThe central object of study — the idea that a concept like truth is encoded as a direction in the LLM's latent space
- Factualityassociated_withScoped definition of 'truth' used in the paper: the truth or falsehood of declarative factual statements
Artifacts (1)
artifact
- Code, datasets, and interactive data explorer released with the paper
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Where inside the LLM should we look for an accurate truth direction that will generalize the most across tasks?question0.832One of the three guiding research questions of the paper.
- Central empirical conclusion of the paper about the fundamental limits of truth directions.
- Core claim of ReflCtrl that a single direction captures and controls reflection
- A hypothesized direction in LLM activation space that encodes the truth or falsehood of factual statements
- Key limitation and open question about experimental scope.
- Central interpretive claim of the paper
- Do LLMs have a unified representation of truth that spans structurally and topically diverse data?question0.758Central research question driving dataset design and experimental approach
- The paper's central construct: a vector in LLM activation space encoding the transition between reflection levels.