claim

active

claim:using-the-ask-correct-prompt-improves-cross-task-generalization-of-arithmetic-probes-to-factual-tasks-f0-f2

Using the ask-correct prompt improves cross-task generalization of arithmetic probes to factual tasks F0-F2.

Finding that explicit correctness framing partially aligns truth directions across task families.

Source paper

extracted_from

Testing the Limits of Truth Directions in LLMs

(2026) · Angelos Poulis · Mark Crovella · Evimaria Terzi

Neighborhood — ranked by edge-count

Findings (1)

finding

Under ask-correct, probes trained on arithmetic tasks A1-A3 generalize almost perfectly to factual tasks F0-F2 (AUROC ~1.0), whereas under no-prompt this generalization is largely absent.
supports
Key improvement in cross-task generalization enabled by explicit instruction framing.

Claims (1)

claim

The generalization improvement from explicit instructions observed in Llama models (A1-A3 to F0-F2) is more pronounced for F3-F5 to F0-F2 in Gemma models.
extends
Shows the instruction effect, while shifting geometry, may not produce consistent generalization effects across model families.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

The ask-arith prompt shows weaker generalization to factual tasks compared to other explicit prompts, suggesting a specialized arithmetic prompt does not create a unified truth direction across task families.claim0.854
From the cross-task generalization heatmaps in Appendix B.3.3.
Probes trained under different explicit instruction prompts (ask-correct, ask-t/f, ask-able, ask-arith) are highly aligned with each other in cosine similarity.finding0.845
Shows the passive vs. active divide is more important than the specific wording of instructions.
The ask-correct template delays truth direction emergence for F3 and reduces performance for F4-F5 compared to no-prompt.finding0.814
Shows instruction effects extend to harder factual tasks.
Will the no-prompt truth directions generalize to ask-correct activations?question0.785
Specific question motivating the cross-template generalization experiment in Section 5.2.
Truth probes fail to generalize to harder factual tasks F3-F5 regardless of prompt template, with AUROC near or below 0.6.finding0.774
Establishes F3-F5 as a hard generalization boundary that instructions cannot overcome.
Truth directions fail to generalize to harder tasks (F3-F5) regardless of prompt template because activations remain highly entangled.claim0.772
Establishes task difficulty as a hard limit that instructions cannot overcome.
Within-family factual generalization (F0-F2) is consistently strong across all models and prompt settings.finding0.771
Establishes a reliable baseline for factual truth direction universality within simple factual recall.
Factual tasks F0-F3 reach near-perfect AUROC in early-to-mid layers of Llama-3.1-8B; arithmetic tasks A1-A3 emerge much later; counting tasks F4-F5 emerge late similar to arithmetic.finding0.771
Core empirical finding about layer-dependent truth direction emergence across task types.