concept
active
concept:truth-direction-in-llm-latent-space

Truth Direction in LLM Latent Space

A specific direction in an LLM's residual stream that encodes the truth or falsehood of factual statements

Neighborhood — ranked by edge-count

Concepts (3)

concept
  • Linear direction in LLM activations associated with truthfulness, identified by Burns et al. 2022 and Azaria & Mitchell 2023
  • The central object of study — the idea that a concept like truth is encoded as a direction in the LLM's latent space
  • Factuality
    associated_with
    Scoped definition of 'truth' used in the paper: the truth or falsehood of declarative factual statements

Artifacts (1)

artifact

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.