paper:arxiv-2501-06141Emergent symbol-like number variables in artificial neural networks
Original abstract (expand)
What types of numeric representations emerge in neural systems, and what would a satisfying answer to this question look like? In this work, we interpret Neural Network (NN) solutions to sequence based number tasks using a variety of methods to understand how well we can interpret them through the lens of interpretable Symbolic Algorithms (SAs) -- precise programs describable by rules and typed, mutable variables. We use autoregressive GRUs, LSTMs, and Transformers trained on tasks where the correct tokens depend on numeric information only latent in the task structure. We show through multiple causal and theoretical methods that we can interpret raw NN activity through the lens of simplified SAs when we frame the activity in terms of neural subspaces rather than individual neurons. Using Distributed Alignment Search (DAS), we find that, depending on network architecture, dimensionality, and task specifications, alignments with SA's can be very high, or they can be only approximate, or fail altogether. We extend our analytic toolkit to address the failure cases by expanding the DAS framework to a broader class of alignment functions that more flexibly capture NN activity in terms of interpretable variables from SAs, and we provide theoretic and empirical explorations of Linear Alignment Functions (LAFs) in contrast to the preexisting Orthogonal Alignment Functions (OAFs). Through analyses of specific cases we confirm the usefulness of causal interventions on neural subspaces for NN interpretability, and we show that recurrent models can develop graded, symbol-like number variables in their neural activity. We further show that shallow Transformers learn very different solutions than recurrent networks, and we prove that such models must use anti-Markovian solutions -- solutions that do not rely on cumulative, Markovian hidden states -- in the absence of sufficient attention layers.
Related work— refs + corpus + external arXiv
Cited / in-corpus / arXiv badges show which signals surfaced each row. Multi-source rows weighted higher.
- The Birth of Knowledge: Emergent Features across Time, Space, and Scale in Large Language ModelsMicah Adler, Nir Shavit Shashata Sawmya2025≈ 73%
- ≈ 72%
- ≈ 72%
- ≈ 71%
- Conceptual Views of Neural Networks: A Framework for Neuro-Symbolic AnalysisJohannes Hirth and Tom Hanika2026≈ 70%
- Neural Operator: Is data all you need to model the world? An insight into the paradigm of data-driven scientific MLMd Ashiqur Rahman, Abhijeet Vyas, Andrey Shor, Beatriz Medeiros, Stephanie Hernandez, Suhas Eswarappa Prameela, Aniket Bera Hrishikesh Viswanath2026≈ 70%
- Convergent Representations of Linguistic Constructions in Human and Artificial Neural SystemsThomas Kinfe, Andreas Maier, Achim Schilling, Patrick Krauss Pegah Ramezani2026≈ 70%
- A Survey on Graph Neural Network Acceleration: Algorithms, Systems, and Customized HardwareAtefeh Sohrabizadeh, Cheng Wan, Zijie Huang, Ziniu Hu, Yewen Wang, Yingyan (Celine) Lin, Jason Cong, Yizhou Sun Shichang Zhang2026≈ 70%
- ≈ 70%
- What Neuroscience Can Teach AI About Learning in Continuously Changing EnvironmentsBruno Averbeck, Georgia Koppe Daniel Durstewitz2025≈ 70%
- ≈ 69%
- ≈ 69%
- Simple Mechanisms for Representing, Indexing and Manipulating ConceptsRaghu Meka, Rina Panigrahy, Kulin Shah Yuanzhi Li2026≈ 69%
- A graph neural network-based model with Out-of-Distribution Robustness for enhancing Antiretroviral Therapy Outcome Prediction for HIV-1Federico Siciliano, Valerio Guarrasi, Anne-Mieke Vandamme, Valeria Ghisetti, Anders S\"onnerborg, Maurizio Zazzi, Fabrizio Silvestri, Laura Palagi Giulia Di Teodoro2026≈ 69%
- Emergence and Causality in Complex Systems: A Survey on Causal Emergence and Related Quantitative Studiesin corpus2023≈ 69%
- There Will Be a Scientific Theory of Deep LearningDaniel Kunin, Alexander Atanasov, Enric Boix-Adser\`a, Blake Bordelon, Jeremy Cohen, Nikhil Ghosh, Florentin Guth, Arthur Jacot, Mason Kamb, Dhruva Karkada, Eric J. Michaud, Berkan Ottlik, Joseph Turnbull Jamie Simon2026≈ 69%
- Neural networks leverage nominally quantum and post-quantum representationsPaul M. Riechers and Thomas J. Elliott and Adam S. Shai2025≈ 69%
- ≈ 68%
- ≈ 67%
- Learning without neurons in physical systemsin corpus2022≈ 67%
- ≈ 67%
- The Platonic Representation Hypothesisin corpus2024≈ 67%
- Taking AI Welfare Seriouslyin corpus2024≈ 66%
- The World Inside Neural Networksin corpus2026≈ 66%
- ≈ 66%
- ≈ 65%
- Why Learning Requires Feelingin corpus2026≈ 65%
- Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representationsin corpus2023≈ 64%
- Zoom In: An Introduction to Circuitsin corpus2020≈ 64%
- A Mathematical Framework for Transformer Circuitsin corpus2021≈ 64%
Similar preprints — Semantic Scholar
Cited by (4)
- Addressing divergent representations from causal interventions on neural networks
Causal intervention methods central to mechanistic interpretability—including activation patching, mean-difference vector patching, Sparse Autoencoders, and Distributed Alignment Search (DAS)—systemat
- Model Alignment Search
Model Alignment Search (MAS) establishes bidirectional causal similarity between neural networks by learning a per-model orthogonal rotation matrix that isolates behaviorally relevant subspaces and us
- Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior
Manifold steering — intervening on model activations along paths constrained to lie on a learned activation manifold M_h rather than along Euclidean linear directions — produces behavioral trajectorie
- Arithmetic in the Wild: Llama uses Base-10 Addition to Reason About Cyclic Concepts
Llama-3.1-8B solves cyclic arithmetic (e.g., "what month is six months after August?") not by performing modular addition in the period of the cyclic concept (12 for months, 7 for days of the week) as