claim
active
claim:causal-abstraction-theory-is-a-unified-framework-that-subsumes-diverse-intervention-based-interpretability-methods-including-lime-causal-mediation-analysis-inlp-and-circuit-explanationsCausal abstraction theory is a unified framework that subsumes diverse intervention-based interpretability methods including LIME, causal mediation analysis, INLP, and circuit explanations
The paper endorses Geiger et al. 2023's claim that disparate interpretability methods are instances of causal abstraction.
Source paper
extracted_from(2023) · Atticus Geiger · Zhengxuan Wu · Christopher Potts · Thomas Icard +1
Neighborhood — ranked by edge-count
Papers (1)
paper
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- A framework the paper uses alongside feature geometry to deepen mechanistic understanding of LMs
- The formal method used to establish that the identified circuit causally mediates the model's cyclic reasoning behavior
- Central thesis of the paper
- Historical framing of how representation assumptions have evolved in causal interpretability
- Formal definition: H is a constructive abstraction of L under alignment Π when interchange interventions have equivalent effects at both levels.
- Methodological claim about the scientific value of combining causal abstraction with representational geometry analysis
- Load-bearing formulation of the paper's central argument
- Authors' interpretation connecting their proof to practical interpretability methodology