thinker:judea-pearlJudea Pearl
Developed causal graph models and the do-operator, foundational to modern causal inference.
Authored papers (1)
- Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations2023ⓒ 9
Distributed alignment search (DAS) resolves two blocking limitations of prior causal abstraction work—brute-force alignment search and the localist assumption that high-level variables map to disjoint neuron sets—by using gradient descent over orthogonal rotation matrices to find alignments in non-standard bases of neural representations. On a hierarchical equality task, a three-layer feed-forward network with hidden size 16 achieves 100% interchange intervention accuracy (IIA) under DAS at layer 1 with an 8-dimensional intervention subspace, whereas the best brute-force localist search reaches only 0.60 IIA and the closest localist alignment only 0.73 IIA. On the Monotonicity NLI benchmark, BERT-base fine-tuned on MoNLI achieves 100% IIA at layer 9 when 256 non-standard basis dimensions of the [CLS] token encode lexical entailment and 256 others encode negation, while no localist alignment exceeds 0.51 IIA on the same task. A subsequent subspace decomposition reveals a structural asymmetry: the hierarchical equality representations of w=x and y=z cannot be decomposed into representations of individual input identities (subspace DAS IIA ≈ 0.50–0.51), whereas the apparent lexical-entailment representation in BERT decomposes almost perfectly (IIA ≈ 0.97–0.98) into two word-identity representations. DAS implies that previous negative or weak causal abstraction findings may have been artifacts of the localist assumption, and that neural networks can genuinely implement tree-structured symbolic algorithms—but that apparent relational representations may sometimes be data structures over entity identities rather than true relational encodings.
More papers — OpenAlex / S2
Originates (1)
Studies (1)
Co-authors (8)
- Atticus Geiger3 shared
- Christopher Potts3 shared
- Noah D. Goodman3 shared
- Thomas Icard3 shared
- Zhengxuan Wu3 shared
- David E. Rumelhart1 shared
- James L. McClelland1 shared
- Paul Smolensky1 shared
Their work is cited by (6)
- CausalGym: Benchmarking causal interpretability methods on linguistic tasks1× refs
- Addressing divergent representations from causal interventions on neural networks1× refs
- pyvene: A Library for Understanding and Improving PyTorch Models via Interventions1× refs
- Model Alignment Search1× refs
- The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?1× refs
- Arithmetic in the Wild: Llama uses Base-10 Addition to Reason About Cyclic Concepts1× refs
Other inbound relations (4)
- mentionsActive inference on discrete state-spaces: a synthesis(paper)
- mentionsFinding Alignments Between Interpretable Causal Variables and Distributed Neural Representations(paper)
- mentionsModel Alignment Search(paper)
- mentionsYuan 2023 Emergence and Causality in Complex Systems: A Survey(artifact)
Recent mentions (5)
- papers-typedgrant-2025-alignment-search.md
- papers-typedgeiger-2023-finding-alignments.md
- papers-typeddacosta_2020_active_inference_discrete.md
- papers-typedfriston_2013_life_as_we_know_it.md
- papers-typedyuan-2023-emergence.md