claim

active

claim:causal-abstraction-theory-is-a-unified-framework-that-subsumes-diverse-intervention-based-interpretability-methods-including-lime-causal-mediation-analysis-inlp-and-circuit-explanations

Causal abstraction theory is a unified framework that subsumes diverse intervention-based interpretability methods including LIME, causal mediation analysis, INLP, and circuit explanations

The paper endorses Geiger et al. 2023's claim that disparate interpretability methods are instances of causal abstraction.

Source paper

extracted_from

Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations

(2023) · Atticus Geiger · Zhengxuan Wu · Christopher Potts · Thomas Icard +1

Neighborhood — ranked by edge-count

Papers (1)

paper

Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations
citessupports

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Causal abstractionconcept0.818
A framework the paper uses alongside feature geometry to deepen mechanistic understanding of LMs
Causal abstraction analysismethod0.801
The formal method used to establish that the identified circuit causally mediates the model's cyclic reasoning behavior
Causal abstraction is not enough for mechanistic interpretability because it becomes vacuous without assumptions about how models encode informationclaim0.798
Central thesis of the paper
Early causal abstraction methods (Geiger et al. 2021) implicitly rely on the privileged bases hypothesis, while recent methods (Geiger et al. 2024b) rely on the linear representation hypothesisclaim0.787
Historical framing of how representation assumptions have evolved in causal interpretability
Constructive Causal Abstractionconcept0.781
Formal definition: H is a constructive abstraction of L under alignment Π when interchange interventions have equivalent effects at both levels.
An interplay between causal abstraction and feature geometry deepens mechanistic understanding of language modelsclaim0.769
Methodological claim about the scientific value of combining causal abstraction with representational geometry analysis
causal abstraction implicitly relies on strong assumptions about how features are encoded in deep neural networks (DNNs), and becomes trivial without such assumptionsquote0.767
Load-bearing formulation of the paper's central argument
Causal abstraction implicitly relies on strong assumptions about feature encoding in DNNs, and becomes trivial without such assumptionsclaim0.759
Authors' interpretation connecting their proof to practical interpretability methodology