hypothesis
active
hypothesis:we-hypothesize-that-explicitly-instructing-the-model-to-evaluate-the-correctness-of-the-given-statement-may-change-the-geometry-of-truth-directionsWe hypothesize that explicitly instructing the model to evaluate the correctness of the given statement may change the geometry of truth directions.
Motivating hypothesis for Section 5's investigation of prompt template effects.
Source paper
extracted_from(2026) · Angelos Poulis · Mark Crovella · Evimaria Terzi
Neighborhood — ranked by edge-count
Papers (1)
paper
Findings (2)
finding
- Shows the passive vs. active divide is more important than the specific wording of instructions.
- Shows that explicit instructions delay the emergence of truth directions in arithmetic tasks.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Does instructing the model to assess correctness affect the geometry of truth directions?question0.911One of the three guiding research questions of the paper.
- Research question motivating Section 5.
- Safety implication derived from multi-dimensional truth structure finding
- Overarching conclusion summarizing the paper's contribution relative to prior universality claims.
- Truth directions emerge in earlier layers for factual tasks and later layers for arithmetic tasks.claim0.812Core empirical claim about the layer-dependence of truth direction emergence as a function of task type.
- Theoretical limitation identified by the authors distinguishing reflection from stylistic tasks.
- Central empirical conclusion of the paper about the fundamental limits of truth directions.
- Where inside the LLM should we look for an accurate truth direction that will generalize the most across tasks?question0.806One of the three guiding research questions of the paper.