claim

active

claim:the-ask-arith-prompt-shows-weaker-generalization-to-factual-tasks-compared-to-other-explicit-prompts-suggesting-a-specialized-arithmetic-prompt-does-not-create-a-unified-truth-direction-across-task-families

The ask-arith prompt shows weaker generalization to factual tasks compared to other explicit prompts, suggesting a specialized arithmetic prompt does not create a unified truth direction across task families.

From the cross-task generalization heatmaps in Appendix B.3.3.

Source paper

extracted_from

Testing the Limits of Truth Directions in LLMs

(2026) · Angelos Poulis · Mark Crovella · Evimaria Terzi

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Using the ask-correct prompt improves cross-task generalization of arithmetic probes to factual tasks F0-F2.claim0.854
Finding that explicit correctness framing partially aligns truth directions across task families.
Under ask-correct, probes trained on arithmetic tasks A1-A3 generalize almost perfectly to factual tasks F0-F2 (AUROC ~1.0), whereas under no-prompt this generalization is largely absent.finding0.844
Key improvement in cross-task generalization enabled by explicit instruction framing.
Probes trained under different explicit instruction prompts (ask-correct, ask-t/f, ask-able, ask-arith) are highly aligned with each other in cosine similarity.finding0.804
Shows the passive vs. active divide is more important than the specific wording of instructions.
Truth directions fail to generalize to harder tasks (F3-F5) regardless of prompt template because activations remain highly entangled.claim0.795
Establishes task difficulty as a hard limit that instructions cannot overcome.
Truth probes fail to generalize to harder factual tasks F3-F5 regardless of prompt template, with AUROC near or below 0.6.finding0.791
Establishes F3-F5 as a hard generalization boundary that instructions cannot overcome.
Will the no-prompt truth directions generalize to ask-correct activations?question0.789
Specific question motivating the cross-template generalization experiment in Section 5.2.
Under ask-correct, arithmetic tasks A1-A2 show gradual AUROC increase peaking only in final layers, unlike the sharp transition under no-prompt.finding0.776
Shows that explicit instructions delay the emergence of truth directions in arithmetic tasks.
Random word prefix prompts show emergence patterns similar to no-prompt, suggesting prompt length alone does not shift truth geometry.claim0.774
Control experiment ruling out token-count as the cause of truth geometry shifts.