finding
active
finding:for-simple-factual-tasks-f0-f3-probe-directions-show-a-sharp-geometric-transition-in-middle-layers-with-late-layer-probes-converging-to-high-cosine-similarity-a3-and-f4-f5-show-no-clear-transitionFor simple factual tasks F0-F3, probe directions show a sharp geometric transition in middle layers, with late-layer probes converging to high cosine similarity; A3 and F4-F5 show no clear transition.
Geometric evidence for convergence to stable truth directions only for simpler tasks.
Source paper
extracted_from(2026) · Angelos Poulis · Mark Crovella · Evimaria Terzi
Neighborhood — ranked by edge-count
Claims (1)
claim
- Supported by the geometric transition visible in cosine similarity heatmaps for F0-F3.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Demonstrates that early-layer probes capture sentence polarity rather than truth.
- Shows the passive vs. active divide is more important than the specific wording of instructions.
- Layer-wise geometry shows early dip, mid-layer alignment, and late standardization across tasksclaim0.790Qualitative pattern from E3.
- Gemma-3-4B-it shows three-stage layer trajectory and S(ℓ) peak despite scale differences in dr and ρdfinding0.788E3 backbone generalization finding for Gemma; validates pattern across diverse architectures
- Key improvement in cross-task generalization enabled by explicit instruction framing.
- Argues against the single-layer analysis approach of prior work.
- Variance decomposition showing the disentanglement of polarity from truth across model depth.
- Core empirical finding about layer-dependent truth direction emergence across task types.