question

active

question:will-the-no-prompt-truth-directions-generalize-to-ask-correct-activations

Will the no-prompt truth directions generalize to ask-correct activations?

Specific question motivating the cross-template generalization experiment in Section 5.2.

Source paper

extracted_from

Testing the Limits of Truth Directions in LLMs

(2026) · Angelos Poulis · Mark Crovella · Evimaria Terzi

Neighborhood — ranked by edge-count

Findings (1)

finding

No-prompt probes show significant AUROC performance drop when evaluated on ask-correct activations, especially at layers where arithmetic truth directions emerge under no-prompt.
answered_by
Generalization evidence that truth probes are not invariant to model instructions.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Truth directions fail to generalize to harder tasks (F3-F5) regardless of prompt template because activations remain highly entangled.claim0.829
Establishes task difficulty as a hard limit that instructions cannot overcome.
Discovered truth directions are highly specific and do not interfere with general instruction-following behaviorclaim0.813
Interpretation of KL divergence retention results
Under ask-correct, probes trained on arithmetic tasks A1-A3 generalize almost perfectly to factual tasks F0-F2 (AUROC ~1.0), whereas under no-prompt this generalization is largely absent.finding0.794
Key improvement in cross-task generalization enabled by explicit instruction framing.
With unrestricted vocabulary, models occasionally respond in non-English Yes/No equivalents (e.g., Sí, Nein) after truth-direction interventionsfinding0.791
Suggestive evidence for language-independent truth representation in LLMs
The ask-arith prompt shows weaker generalization to factual tasks compared to other explicit prompts, suggesting a specialized arithmetic prompt does not create a unified truth direction across task families.claim0.789
From the cross-task generalization heatmaps in Appendix B.3.3.
Using the ask-correct prompt improves cross-task generalization of arithmetic probes to factual tasks F0-F2.claim0.785
Finding that explicit correctness framing partially aligns truth directions across task families.
The ask-correct template delays truth direction emergence for F3 and reduces performance for F4-F5 compared to no-prompt.finding0.783
Shows instruction effects extend to harder factual tasks.
Where inside the LLM should we look for an accurate truth direction that will generalize the most across tasks?question0.780
One of the three guiding research questions of the paper.