claim
active
claim:divergent-representations-are-a-common-if-not-likely-outcome-of-causal-interventions-across-a-wide-range-of-methodsDivergent representations are a common, if not likely, outcome of causal interventions across a wide range of methods
Core empirical claim of the paper supported by both theoretical proof and empirical demonstration
Source paper
extracted_from(2025) · Satchel Grant · Simon Jerome Han · Alexa R. Tartaglini · Christopher Potts
Neighborhood — ranked by edge-count
Findings (4)
finding
- Theoretical proof that patching produces divergent representations for most manifold geometries
- Empirical demonstration that DAS interventions produce divergent representations
- Empirical demonstration that MDVP produces divergent representations in a real LLM
- Empirical demonstration that SAE projections produce divergent representations in a real LLM
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Do divergent representations change what an intervention can say about an NN's natural mechanisms?question0.829Core research question motivating the paper
- Third core research question motivating the CL loss approach in Section 5
- The use of interventions (rather than correlations) to establish a causal link between representation geometry and behavioral geometry.
- Load-bearing description of the core pernicious divergence mechanism illustrated in Figure 1
- Practical utility of reducing divergence demonstrated through regression analysis
- Core phenomenon studied: when causal interventions shift internal representations away from the natural distribution
- Core question motivating the shift from linear to geometry-aware steering; answered via manifold alignment analysis.
- Qualified positive claim from spatio permutation analysis where two cases satisfy all three criteria.