claim
active
claim:truth-directions-emerge-in-earlier-layers-for-factual-tasks-and-later-layers-for-arithmetic-tasksTruth directions emerge in earlier layers for factual tasks and later layers for arithmetic tasks.
Core empirical claim about the layer-dependence of truth direction emergence as a function of task type.
Source paper
extracted_from(2026) · Angelos Poulis · Mark Crovella · Evimaria Terzi
Neighborhood — ranked by edge-count
Papers (1)
paper
Findings (1)
finding
- Core empirical finding about layer-dependent truth direction emergence across task types.
Hypotheses (1)
hypothesis
- Core working hypothesis motivating the factual vs. arithmetic task split in the experimental design.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Argues against the single-layer analysis approach of prior work.
- Establishes task difficulty as a hard limit that instructions cannot overcome.
- Identified as the exact computational operation that breaks truth direction generalization.
- Truth-related directions reliably emerge at 60–75% of normalized layer depth in Qwen and Gemma modelsfinding0.828Experiment 1 finding localizing where truth can be causally mediated
- Reflection-inducing directions emerge more clearly in higher layers (ℓ>5) for both models and datasetsfinding0.815Empirical observation about which network layers encode reflection-relevant information.
- Methodological critique of prior work that fixed a single layer for truth probing.
- Motivating hypothesis for Section 5's investigation of prompt template effects.
- Overarching conclusion summarizing the paper's contribution relative to prior universality claims.