question
active
question:where-inside-the-llm-should-we-look-for-an-accurate-truth-direction-that-will-generalize-the-most-across-tasksWhere inside the LLM should we look for an accurate truth direction that will generalize the most across tasks?
One of the three guiding research questions of the paper.
Source paper
extracted_from(2026) · Angelos Poulis · Mark Crovella · Evimaria Terzi
Neighborhood — ranked by edge-count
Papers (1)
paper
Claims (1)
claim
- Argues against the single-layer analysis approach of prior work.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Linear direction in LLM activations associated with truthfulness, identified by Burns et al. 2022 and Azaria & Mitchell 2023
- Central empirical conclusion of the paper about the fundamental limits of truth directions.
- A specific direction in an LLM's residual stream that encodes the truth or falsehood of factual statements
- Motivating hypothesis for Section 5's investigation of prompt template effects.
- Key limitation and open question about experimental scope.
- Central interpretive claim of the paper
- Truth directions emerge in earlier layers for factual tasks and later layers for arithmetic tasks.claim0.792Core empirical claim about the layer-dependence of truth direction emergence as a function of task type.
- Does instructing the model to assess correctness affect the geometry of truth directions?question0.786One of the three guiding research questions of the paper.