finding
active
finding:mean-difference-patching-on-llama-3-8b-layer-10-produces-intervened-emd-exceeding-the-natural-natural-baselineMean difference patching on Llama-3-8B layer 10 produces intervened EMD exceeding the natural-natural baseline
Empirical demonstration that MDVP produces divergent representations in a real LLM
Source paper
extracted_from(2025) · Satchel Grant · Simon Jerome Han · Alexa R. Tartaglini · Christopher Potts
Neighborhood — ranked by edge-count
Claims (1)
claim
- Core empirical claim of the paper supported by both theoretical proof and empirical demonstration
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- SAE reconstructions on Llama-3-8B layer 25 produce intervened EMD exceeding the natural-natural baselinefinding0.870Empirical demonstration that SAE projections produce divergent representations in a real LLM
- Math and code tasks show strongest mid-layer anchoring on LLaMA (S ≈ −1.65 at layers 8-12)finding0.790Task-specific E3 finding showing compositional reasoning requires deeper processing
- Localizes truth representations to specific hidden states, motivating the rest of the analysis
- LLaMA-3.1-8B: Sbmax = -1.896 ± 0.211, AUSN = -2.119 ± 0.198, peak layer ℓ* = 10 (median)finding0.780Seed-pooled geometry-only statistics (per-dev z units).
- Synthetic theoretical example showing pernicious divergence via hidden pathway activation
- Validates representational drift theory: later layers specialize for next-token prediction, increasing dr
- Llama-3.3-70B exhibits internal consistency-checking mechanisms that operate during inferenceclaim0.771Central interpretive claim of the paper supported by causal ablation and activation evidence
- Connects this study's results to Schrimpf et al. 2021 and Caucheteux et al. 2022/2023 findings on brain-LLM alignment.