claim
active
claim:a-small-group-of-causally-implicated-hidden-states-encodes-llm-truth-representations-localized-over-clause-ending-punctuation-tokens

A small group of causally-implicated hidden states encodes LLM truth representations, localized over clause-ending punctuation tokens

Localization result from patching experiments; identifies group (b) hidden states as the locus of truth representations

Neighborhood — ranked by edge-count

Findings (2)

finding

Concepts (1)

concept
  • The phenomenon where LLMs encode clause-level information over clause-ending punctuation tokens rather than the final content token

Methods (1)

method
  • Technique to localize causally implicated hidden states by swapping residual stream activations between a true and false input and measuring downstream log-probability changes

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Restated by (1)

cosine ≥ 0.90

Other entities that say roughly the same thing. May be merge candidates or independent restatements across papers.