claim

active

claim:das-overcomes-the-localist-limitation-of-prior-causal-abstraction-by-allowing-individual-neurons-to-play-multiple-roles-via-non-standard-bases

DAS overcomes the localist limitation of prior causal abstraction by allowing individual neurons to play multiple roles via non-standard bases

Central claim motivating DAS over prior methods.

Source paper

extracted_from

Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations

(2023) · Atticus Geiger · Zhengxuan Wu · Christopher Potts · Thomas Icard +1

Neighborhood — ranked by edge-count

Papers (1)

paper

Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations
introduces

Findings (4)

finding

DAS achieves 100% IIA on hierarchical equality task with |N|=16, intervention size 8, Layer 1
supports
DAS discovers a perfect alignment between the feed-forward network and the Both Equality Relations high-level model.
DAS achieves 100% IIA for combined Negation and Lexical Entailment model on MoNLI at Layer 9, intervention size 256
supports
Perfect abstraction relation between BERT and symbolic algorithm with negation and lexical entailment variables.
Best localist alignment achieves IIA of 0.73 on hierarchical equality Both Equality Relations in Layer 1
supports
Shows localist alignment fails to capture the distributed structure found by DAS.
Localist alignment achieves ~0.51 IIA on MoNLI tasks, near chance performance
supports
Localist methods fail entirely on MoNLI distributed representations.

Questions (1)

question

Can an interpretable symbolic algorithm be used to faithfully explain a complex neural network model?
gates
Framing question for the paper's research program.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

DAS achieves substantial causal effect even on arbitrary input-output mappings where no causal mechanism should existfinding0.820
Replication of Wu et al. 2023 finding; DAS expressivity concern validated in CausalGym setup
There is a many-to-many mapping between neurons and concepts, meaning multiple high-level causal variables might be encoded in overlapping groups of neuronsclaim0.794
Fundamental theoretical claim motivating DAS, attributed to Smolensky/Rumelhart/McClelland.
Early causal abstraction methods (Geiger et al. 2021) implicitly rely on the privileged bases hypothesis, while recent methods (Geiger et al. 2024b) rely on the linear representation hypothesisclaim0.790
Historical framing of how representation assumptions have evolved in causal interpretability
The discovery of perfect abstract equality representations that cannot be decomposed into entity representations is a foundational result informing our understanding of how symbolic and connectionist architectures coexistclaim0.786
Concluding claim about theoretical significance of the hierarchical equality finding.
Smolensky (1986) proposes that viewing a neural representation under a basis that is not aligned with individual neurons can reveal the interpretable distributed structure of the neural representations.quote0.780
Load-bearing theoretical claim providing the conceptual foundation for DAS.
causal abstraction implicitly relies on strong assumptions about how features are encoded in deep neural networks (DNNs), and becomes trivial without such assumptionsquote0.779
Load-bearing formulation of the paper's central argument
You can only get the profound multiple structure of centers by unfolding each bit from the previous state, allowing the next layer of structure to appear from the previously established layers.claim0.777
Explains why time and sequence are essential for generated complexity.
Causal abstraction implicitly relies on strong assumptions about feature encoding in DNNs, and becomes trivial without such assumptionsclaim0.777
Authors' interpretation connecting their proof to practical interpretability methodology