finding
active
finding:truth-probes-fail-to-generalize-to-harder-factual-tasks-f3-f5-regardless-of-prompt-template-with-auroc-near-or-below-0-6

Truth probes fail to generalize to harder factual tasks F3-F5 regardless of prompt template, with AUROC near or below 0.6.

Establishes F3-F5 as a hard generalization boundary that instructions cannot overcome.

Source paper

extracted_from
Testing the Limits of Truth Directions in LLMs
(2026) · Angelos Poulis · Mark Crovella · Evimaria Terzi

Neighborhood — ranked by edge-count

Claims (1)

claim

Concepts (1)

concept
  • The claim that truth directions are consistent and generalizable across layers, tasks, and prompt formats in LLMs.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.