question
active
question:does-instructing-the-model-to-assess-correctness-affect-the-geometry-of-truth-directionsDoes instructing the model to assess correctness affect the geometry of truth directions?
One of the three guiding research questions of the paper.
Source paper
extracted_from(2026) · Angelos Poulis · Mark Crovella · Evimaria Terzi
Neighborhood — ranked by edge-count
Papers (1)
paper
Claims (1)
claim
- The model appears to encode truth differently under passive versus active truth evaluation prompts.answered_byKey finding from Section 5 based on low cosine similarity between no-prompt and ask-correct probes.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Motivating hypothesis for Section 5's investigation of prompt template effects.
- Research question motivating Section 5.
- Theoretical limitation identified by the authors distinguishing reflection from stylistic tasks.
- Open question on generalization beyond Gemma and Qwen families
- Safety implication derived from multi-dimensional truth structure finding
- Central empirical conclusion of the paper about the fundamental limits of truth directions.
- Where inside the LLM should we look for an accurate truth direction that will generalize the most across tasks?question0.786One of the three guiding research questions of the paper.
- Interpretation of KL divergence retention results