paper:geiger-causal-abstractions-of-neural-networks-2021Causal abstractions of neural networks
Related work— refs + corpus + external arXiv
Cited / in-corpus / arXiv badges show which signals surfaced each row. Multi-source rows weighted higher.
- ≈ 78%
- Combining Causal Models for More Accurate Abstractions of Neural NetworksSara Magliacane, Atticus Geiger Theodora-Mara P\^islar2025≈ 76%
- Conceptual Views of Neural Networks: A Framework for Neuro-Symbolic AnalysisJohannes Hirth and Tom Hanika2026≈ 74%
- CausalARC: Abstract Reasoning with Causal World ModelsJohn Kalantari, Kia Khezeli Jacqueline Maasch2026≈ 73%
- ≈ 72%
- ≈ 72%
- Causal Bayesian Networks for Data-driven Safety Analysis of Complex SystemsLina Putze, Tjark Koopmann, Jan Reich, Christian Neurohr Roman Gansch2025≈ 72%
- On the Mechanistic Interpretability of Neural Networks for Causality in Bio-statisticsJean-Baptiste A. Conan2025≈ 72%
- Generative artificial intelligence-enabled dynamic detection of nicotine-related circuitsChanghong Jing, Ye Li, Xinan Liu, Zuxin Chen, Shuqiang Wang Changwei Gong2022≈ 71%
- PLOT: Progressive Localization via Optimal Transport in Neural Causal AbstractionArya Datla, Ziv Goldfeld Jonathn Chang2026≈ 71%
- ≈ 71%
- Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representationsin corpus2023≈ 71%
- Causal Learner: A Toolbox for Causal Structure and Markov Blanket LearningKui Yu, Yiwen Zhang, Lin Liu, and Jiuyong Li Zhaolong Ling2025≈ 70%
- ≈ 70%
- Abstracting Deep Neural Networks into Concept Graphs for Concept Level InterpretabilityParth Natekar, Ganapathy Krishnamurthi, Balaji Srinivasan Avinash Kori2022≈ 70%
- Identifying Sub-networks in Neural Networks via Functionally Similar RepresentationsAmit Dhurandhar, Karthikeyan Natesan Ramamurthy, Dennis Wei Tian Gao2025≈ 70%
- ≈ 69%
- The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?in corpus2025≈ 69%
- Emergence and Causality in Complex Systems: A Survey on Causal Emergence and Related Quantitative Studiesin corpus2023≈ 69%
- ≈ 68%
- Cognitive glues are shared models of relative scarcities: the economics of collective intelligencein corpus2026≈ 68%
- The World Inside Neural Networksin corpus2026≈ 66%
- ≈ 66%
- ≈ 65%
- ≈ 65%
- ≈ 65%
- ≈ 65%
- ≈ 65%
Similar preprints — Semantic Scholar
Cited by (9)
- Addressing divergent representations from causal interventions on neural networks
Causal intervention methods central to mechanistic interpretability—including activation patching, mean-difference vector patching, Sparse Autoencoders, and Distributed Alignment Search (DAS)—systemat
- Model Alignment Search
Model Alignment Search (MAS) establishes bidirectional causal similarity between neural networks by learning a per-model orthogonal rotation matrix that isolates behaviorally relevant subspaces and us
- The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?
Under arbitrarily powerful alignment maps, causal abstraction becomes vacuous: any neural network can be perfectly mapped to any algorithm, a result proven formally in Theorem 1 under five mild assump
- pyvene: A Library for Understanding and Improving PyTorch Models via Interventions
pyvene is an open-source Python library that unifies intervention-based research on PyTorch neural models by treating the intervention itself—rather than model surgery code—as the primitive abstractio
- CausalGym: Benchmarking causal interpretability methods on linguistic tasks
CausalGym, a benchmark derived from SyntaxGym's 33 test suites and expanded to 29 tasks, establishes that distributed alignment search (DAS) consistently outperforms linear probing, difference-in-mean
- The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets
At sufficient scale, LLMs linearly represent the truth or falsehood of factual statements in their internal activations — a claim supported by PCA visualizations, cross-dataset probe transfer, and cau
- Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations
Distributed alignment search (DAS) resolves two blocking limitations of prior causal abstraction work—brute-force alignment search and the localist assumption that high-level variables map to disjoint
- Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior
Manifold steering — intervening on model activations along paths constrained to lie on a learned activation manifold M_h rather than along Euclidean linear directions — produces behavioral trajectorie
- Arithmetic in the Wild: Llama uses Base-10 Addition to Reason About Cyclic Concepts
Llama-3.1-8B solves cyclic arithmetic (e.g., "what month is six months after August?") not by performing modular addition in the period of the cyclic concept (12 for months, 7 for days of the week) as