paper:geiger-causal-abstraction-a-theoretical-foundat-2025Causal abstraction: A theoretical foundation for mechanistic interpretability
Related work— refs + corpus + external arXiv
Cited / in-corpus / arXiv badges show which signals surfaced each row. Multi-source rows weighted higher.
- ≈ 81%
- A Mathematical Philosophy of Explanations in Mechanistic Interpretability -- The Strange Science Part I.iLouis Jaburi Kola Ayonrinde2025≈ 81%
- Combining Causal Models for More Accurate Abstractions of Neural NetworksSara Magliacane, Atticus Geiger Theodora-Mara P\^islar2025≈ 80%
- Position: Mechanistic Interpretability Must Disclose Identification Assumptions for Causal ClaimsFengming Liu Zezheng Lin2026≈ 80%
- ≈ 79%
- ≈ 79%
- Mechanistic Interpretability Needs PhilosophyNinell Oldenburg, Ruchira Dhar, Joshua Hatherley, Constanza Fierro, Nina Rajcic, Sandrine R. Schiller, Filippos Stamatiou, Anders S{\o}gaard Iwan Williams2025≈ 78%
- From Mechanistic to Compositional InterpretabilityThomas Dooms, Steven T. Holmer, Kola Ayonrinde, Geraint A. Wiggins Ward Gauderis2026≈ 78%
- Causally Grounded Mechanistic Interpretability for LLMs with Faithful Natural-Language ExplanationsAjay Pravin Mahale2026≈ 77%
- Validating Mechanistic Interpretations: An Axiomatic ApproachRavi Mangal, Zifan Wang, Saranya Vijayakumar, Corina S. Pasareanu, Somesh Jha Nils Palumbo2025≈ 77%
- Interpretability as Alignment: Making Internal Understanding a Design PrinciplePratinav Seth, Vinay Kumar Sankarapu Aadit Sengupta2025≈ 77%
- ≈ 77%
- A macro agent and its actionsFrancesco Massari, Maggie Beheler-Amass and Giulio Tononi Larissa Albantakis2020≈ 77%
- Evaluating Explanations: An Explanatory Virtues Framework for Mechanistic Interpretability -- The Strange Science Part I.iiLouis Jaburi Kola Ayonrinde2025≈ 76%
- An Encoding of Abstract Dialectical Frameworks into Higher-Order LogicAlexander Steen Antoine Martina2026≈ 76%
- The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?in corpus2025≈ 72%
- ≈ 71%
- ≈ 71%
- Emergence and Causality in Complex Systems: A Survey on Causal Emergence and Related Quantitative Studiesin corpus2023≈ 70%
- ≈ 70%
- Finger Exercises in Formal Concept Analysisin corpus2006≈ 70%
- Cognitive glues are shared models of relative scarcities: the economics of collective intelligencein corpus2026≈ 69%
- ≈ 69%
- ≈ 69%
- ≈ 68%
- Denotational Design: from meanings to programsin corpus2015≈ 68%
- ≈ 68%
- Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representationsin corpus2023≈ 68%
Similar preprints — Semantic Scholar
Cited by (6)
- Addressing divergent representations from causal interventions on neural networks
Causal intervention methods central to mechanistic interpretability—including activation patching, mean-difference vector patching, Sparse Autoencoders, and Distributed Alignment Search (DAS)—systemat
- Model Alignment Search
Model Alignment Search (MAS) establishes bidirectional causal similarity between neural networks by learning a per-model orthogonal rotation matrix that isolates behaviorally relevant subspaces and us
- The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?
Under arbitrarily powerful alignment maps, causal abstraction becomes vacuous: any neural network can be perfectly mapped to any algorithm, a result proven formally in Theorem 1 under five mild assump
- CausalGym: Benchmarking causal interpretability methods on linguistic tasks
CausalGym, a benchmark derived from SyntaxGym's 33 test suites and expanded to 29 tasks, establishes that distributed alignment search (DAS) consistently outperforms linear probing, difference-in-mean
- Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior
Manifold steering — intervening on model activations along paths constrained to lie on a learned activation manifold M_h rather than along Euclidean linear directions — produces behavioral trajectorie
- Arithmetic in the Wild: Llama uses Base-10 Addition to Reason About Cyclic Concepts
Llama-3.1-8B solves cyclic arithmetic (e.g., "what month is six months after August?") not by performing modular addition in the period of the cyclic concept (12 for months, 7 for days of the week) as