claim
active
claim:causal-abstraction-theory-is-a-unified-framework-that-subsumes-diverse-intervention-based-interpretability-methods-including-lime-causal-mediation-analysis-inlp-and-circuit-explanations

Causal abstraction theory is a unified framework that subsumes diverse intervention-based interpretability methods including LIME, causal mediation analysis, INLP, and circuit explanations

The paper endorses Geiger et al. 2023's claim that disparate interpretability methods are instances of causal abstraction.

Source paper

extracted_from
Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations
(2023) · Atticus Geiger · Zhengxuan Wu · Christopher Potts · Thomas Icard +1

Neighborhood — ranked by edge-count

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.