finding

active

finding:within-family-factual-generalization-f0-f2-is-consistently-strong-across-all-models-and-prompt-settings

Within-family factual generalization (F0-F2) is consistently strong across all models and prompt settings.

Establishes a reliable baseline for factual truth direction universality within simple factual recall.

Source paper

extracted_from

Testing the Limits of Truth Directions in LLMs

(2026) · Angelos Poulis · Mark Crovella · Evimaria Terzi

Neighborhood — ranked by edge-count

Claims (1)

claim

Linear truth directions in LLMs are reliable primarily in factual recall cases and break down when truth assessment depends on computing and storing intermediate results.
supports
Central empirical conclusion of the paper about the fundamental limits of truth directions.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

The generalization improvement from explicit instructions observed in Llama models (A1-A3 to F0-F2) is more pronounced for F3-F5 to F0-F2 in Gemma models.claim0.798
Shows the instruction effect, while shifting geometry, may not produce consistent generalization effects across model families.
Pure factual-recall tasks F0-F2 show robust AUROC performance across all instruction template variations.claim0.780
Contrasts with harder tasks that are sensitive to prompt variations.
The difficulty boundary for truth directions replicates across all four tested models (Llama-3.2-3B, Llama-3.1-8B, Gemma-2-2b, Gemma-2-9b); generalization to F3-F5 remains consistently low regardless of model size or family.finding0.777
Establishes generalizability of the core difficulty-boundary finding across model families.
Truth probes fail to generalize to harder factual tasks F3-F5 regardless of prompt template, with AUROC near or below 0.6.finding0.775
Establishes F3-F5 as a hard generalization boundary that instructions cannot overcome.
Truth directions fail to generalize to harder tasks (F3-F5) regardless of prompt template because activations remain highly entangled.claim0.774
Establishes task difficulty as a hard limit that instructions cannot overcome.
Factual task hierarchy (F0–F5)framework0.772
A controlled six-level hierarchy of factual tasks increasing in complexity from simple city-location recall to double-counting constraints.
Using the ask-correct prompt improves cross-task generalization of arithmetic probes to factual tasks F0-F2.claim0.771
Finding that explicit correctness framing partially aligns truth directions across task families.
Under ask-correct, probes trained on arithmetic tasks A1-A3 generalize almost perfectly to factual tasks F0-F2 (AUROC ~1.0), whereas under no-prompt this generalization is largely absent.finding0.768
Key improvement in cross-task generalization enabled by explicit instruction framing.