finding
active
finding:boundless-das-interchange-interventions-produce-emd-exceeding-natural-natural-baselineBoundless DAS interchange interventions produce EMD exceeding natural-natural baseline
Empirical demonstration that DAS interventions produce divergent representations
Source paper
extracted_from(2025) · Satchel Grant · Simon Jerome Han · Alexa R. Tartaglini · Christopher Potts
Neighborhood — ranked by edge-count
Claims (1)
claim
- Core empirical claim of the paper supported by both theoretical proof and empirical demonstration
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- SAE reconstructions on Llama-3-8B layer 25 produce intervened EMD exceeding the natural-natural baselinefinding0.763Empirical demonstration that SAE projections produce divergent representations in a real LLM
- Interpretive claim from Case Study II about the distinction between correlational probes and causal interventions
- DAS behavioral loss produces EMD along feature dimensions of 0.032±0.003 on synthetic 10-class datasetfinding0.748Quantitative baseline for divergence using behavioral DAS loss on synthetic dataset
- Practical utility of reducing divergence demonstrated through regression analysis
- Empirical result showing the CL loss can reduce divergence without sacrificing interpretability accuracy
- Replication of Wu et al. 2023 finding; DAS expressivity concern validated in CausalGym setup
- Empirical demonstration that MDVP produces divergent representations in a real LLM
- Central claim motivating DAS over prior methods.