finding
active
finding:probes-trained-on-a1-degrade-significantly-when-evaluated-on-a2-and-more-on-a3-training-on-a2-achieves-only-auroc-0-65-on-a3

Probes trained on A1 degrade significantly when evaluated on A2 and more on A3; training on A2 achieves only AUROC ~0.65 on A3.

Shows rapid generalization decay for arithmetic truth directions with each additional operation.

Source paper

extracted_from
Testing the Limits of Truth Directions in LLMs
(2026) · Angelos Poulis · Mark Crovella · Evimaria Terzi

Neighborhood — ranked by edge-count

Claims (1)

claim

Hypotheses (1)

hypothesis

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.