claim
active
claim:probes-trained-under-different-explicit-instruction-variants-are-highly-aligned-with-each-other-despite-different-wording

Probes trained under different explicit instruction variants are highly aligned with each other despite different wording.

Shows the key divide is passive vs. active framing, not the specific wording of instructions.

Source paper

extracted_from
Testing the Limits of Truth Directions in LLMs
(2026) · Angelos Poulis · Mark Crovella · Evimaria Terzi

Neighborhood — ranked by edge-count

Findings (1)

finding

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.