question
active
question:will-the-no-prompt-truth-directions-generalize-to-ask-correct-activationsWill the no-prompt truth directions generalize to ask-correct activations?
Specific question motivating the cross-template generalization experiment in Section 5.2.
Source paper
extracted_from(2026) · Angelos Poulis · Mark Crovella · Evimaria Terzi
Neighborhood — ranked by edge-count
Findings (1)
finding
- Generalization evidence that truth probes are not invariant to model instructions.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Establishes task difficulty as a hard limit that instructions cannot overcome.
- Interpretation of KL divergence retention results
- Key improvement in cross-task generalization enabled by explicit instruction framing.
- Suggestive evidence for language-independent truth representation in LLMs
- From the cross-task generalization heatmaps in Appendix B.3.3.
- Finding that explicit correctness framing partially aligns truth directions across task families.
- Shows instruction effects extend to harder factual tasks.
- Where inside the LLM should we look for an accurate truth direction that will generalize the most across tasks?question0.780One of the three guiding research questions of the paper.