finding
active
finding:truth-related-directions-reliably-emerge-at-60-75-of-normalized-layer-depth-in-qwen-and-gemma-modelsTruth-related directions reliably emerge at 60–75% of normalized layer depth in Qwen and Gemma models
Experiment 1 finding localizing where truth can be causally mediated
Source paper
extracted_from(2025) · Kevin Shengyang Yu · Vaidehi Bulusu · Oscar Yasunaga · Lau, Clayton +4
Neighborhood — ranked by edge-count
Papers (1)
paper
Claims (1)
claim
- Central interpretive claim of the paper
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Establishes generalizability of the core difficulty-boundary finding across model families.
- Truth directions emerge in earlier layers for factual tasks and later layers for arithmetic tasks.claim0.828Core empirical claim about the layer-dependence of truth direction emergence as a function of task type.
- Argues against the single-layer analysis approach of prior work.
- Reflection-inducing directions emerge more clearly in higher layers (ℓ>5) for both models and datasetsfinding0.814Empirical observation about which network layers encode reflection-relevant information.
- Supported by the geometric transition visible in cosine similarity heatmaps for F0-F3.
- Overarching conclusion summarizing the paper's contribution relative to prior universality claims.
- Variance decomposition showing the disentanglement of polarity from truth across model depth.
- Central empirical conclusion of the paper about the fundamental limits of truth directions.