finding
active
finding:f0-trained-probes-in-layers-4-10-show-inverted-separation-on-f1-auroc-0-systematically-misclassifying-true-statements-as-false

F0-trained probes in layers 4-10 show inverted separation on F1 (AUROC ≈ 0), systematically misclassifying true statements as false.

Demonstrates that early-layer probes capture sentence polarity rather than truth.

Source paper

extracted_from
Testing the Limits of Truth Directions in LLMs
(2026) · Angelos Poulis · Mark Crovella · Evimaria Terzi

Neighborhood — ranked by edge-count

Claims (1)

claim

Concepts (1)

concept
  • The claim that truth directions are consistent and generalizable across layers, tasks, and prompt formats in LLMs.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.