claim
active
claim:truth-evaluation-framing-specifically-contributes-to-truth-geometry-shifts-beyond-generic-instruction-following-prefixTruth-evaluation framing specifically contributes to truth geometry shifts beyond generic instruction-following prefix.
Supported by the neutral read-prompt changing emergence but not fully replicating ask-correct cross-task generalization.
Source paper
extracted_from(2026) · Angelos Poulis · Mark Crovella · Evimaria Terzi
Neighborhood — ranked by edge-count
Claims (1)
claim
- Control experiment ruling out token-count as the cause of truth geometry shifts.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Motivating hypothesis for Section 5's investigation of prompt template effects.
- Establishes task difficulty as a hard limit that instructions cannot overcome.
- Does instructing the model to assess correctness affect the geometry of truth directions?question0.770One of the three guiding research questions of the paper.
- Safety implication derived from multi-dimensional truth structure finding
- Truth directions emerge in earlier layers for factual tasks and later layers for arithmetic tasks.claim0.768Core empirical claim about the layer-dependence of truth direction emergence as a function of task type.
- Overarching conclusion summarizing the paper's contribution relative to prior universality claims.
- Patching experiments localize truth representations to these specific hidden states in LLaMA-2 models
- The paper's generalization claim, asserting that the days-of-week finding scales to other cyclic and structured concepts.