claim
active
claim:truth-directions-fail-to-generalize-to-harder-tasks-f3-f5-regardless-of-prompt-template-because-activations-remain-highly-entangledTruth directions fail to generalize to harder tasks (F3-F5) regardless of prompt template because activations remain highly entangled.
Establishes task difficulty as a hard limit that instructions cannot overcome.
Source paper
extracted_from(2026) · Angelos Poulis · Mark Crovella · Evimaria Terzi
Neighborhood — ranked by edge-count
Findings (2)
finding
- Establishes F3-F5 as a hard generalization boundary that instructions cannot overcome.
- Visual geometric evidence for the fundamental entanglement of true/false activations in harder tasks.
Claims (1)
claim
- Pure factual-recall tasks F0-F2 show robust AUROC performance across all instruction template variations.associated_withContrasts with harder tasks that are sensitive to prompt variations.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Shows instruction effects extend to harder factual tasks.
- Truth directions emerge in earlier layers for factual tasks and later layers for arithmetic tasks.claim0.835Core empirical claim about the layer-dependence of truth direction emergence as a function of task type.
- Specific question motivating the cross-template generalization experiment in Section 5.2.
- Overarching conclusion summarizing the paper's contribution relative to prior universality claims.
- Establishes generalizability of the core difficulty-boundary finding across model families.
- Identified as the exact computational operation that breaks truth direction generalization.
- Argues against the single-layer analysis approach of prior work.
- Safety implication derived from multi-dimensional truth structure finding