finding
active
finding:under-ask-correct-arithmetic-tasks-a1-a2-show-gradual-auroc-increase-peaking-only-in-final-layers-unlike-the-sharp-transition-under-no-prompt

Under ask-correct, arithmetic tasks A1-A2 show gradual AUROC increase peaking only in final layers, unlike the sharp transition under no-prompt.

Shows that explicit instructions delay the emergence of truth directions in arithmetic tasks.

Source paper

extracted_from
Testing the Limits of Truth Directions in LLMs
(2026) · Angelos Poulis · Mark Crovella · Evimaria Terzi

Neighborhood — ranked by edge-count

Hypotheses (1)

hypothesis

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.