Emergent symbol-like number variables in artificial neural networks

BySatchel Grant·Noah D. Goodman·James L. McClelland

DOI 10.48550/arxiv.2501.06141 arXiv 2501.06141

Original abstract (expand)

What types of numeric representations emerge in neural systems, and what would a satisfying answer to this question look like? In this work, we interpret Neural Network (NN) solutions to sequence based number tasks using a variety of methods to understand how well we can interpret them through the lens of interpretable Symbolic Algorithms (SAs) -- precise programs describable by rules and typed, mutable variables. We use autoregressive GRUs, LSTMs, and Transformers trained on tasks where the correct tokens depend on numeric information only latent in the task structure. We show through multiple causal and theoretical methods that we can interpret raw NN activity through the lens of simplified SAs when we frame the activity in terms of neural subspaces rather than individual neurons. Using Distributed Alignment Search (DAS), we find that, depending on network architecture, dimensionality, and task specifications, alignments with SA's can be very high, or they can be only approximate, or fail altogether. We extend our analytic toolkit to address the failure cases by expanding the DAS framework to a broader class of alignment functions that more flexibly capture NN activity in terms of interpretable variables from SAs, and we provide theoretic and empirical explorations of Linear Alignment Functions (LAFs) in contrast to the preexisting Orthogonal Alignment Functions (OAFs). Through analyses of specific cases we confirm the usefulness of causal interventions on neural subspaces for NN interpretability, and we show that recurrent models can develop graded, symbol-like number variables in their neural activity. We further show that shallow Transformers learn very different solutions than recurrent networks, and we prove that such models must use anti-Markovian solutions -- solutions that do not rely on cumulative, Markovian hidden states -- in the absence of sufficient attention layers.

Related work— refs + corpus + external arXiv

Cited / in-corpus / arXiv badges show which signals surfaced each row. Multi-source rows weighted higher.

The Birth of Knowledge: Emergent Features across Time, Space, and Scale in Large Language Models
Micah Adler, Nir Shavit Shashata Sawmya
2025
≈ 73%
The Function Representation of Artificial Neural Network
Zhongkui Ma
2026
≈ 72%
Bayesian Neural Networks: An Introduction and Survey
Ethan Goan and Clinton Fookes
2026
≈ 72%
The Causally Emergent Alignment Hypothesis: Causal Emergence Aligns with and Predicts Final Reward in Reinforcement Learning Agents
in corpus
2026
≈ 71%
Conceptual Views of Neural Networks: A Framework for Neuro-Symbolic Analysis
Johannes Hirth and Tom Hanika
2026
≈ 70%
Neural Operator: Is data all you need to model the world? An insight into the paradigm of data-driven scientific ML
Md Ashiqur Rahman, Abhijeet Vyas, Andrey Shor, Beatriz Medeiros, Stephanie Hernandez, Suhas Eswarappa Prameela, Aniket Bera Hrishikesh Viswanath
2026
≈ 70%
Convergent Representations of Linguistic Constructions in Human and Artificial Neural Systems
Thomas Kinfe, Andreas Maier, Achim Schilling, Patrick Krauss Pegah Ramezani
2026
≈ 70%
A Survey on Graph Neural Network Acceleration: Algorithms, Systems, and Customized Hardware
Atefeh Sohrabizadeh, Cheng Wan, Zijie Huang, Ziniu Hu, Yewen Wang, Yingyan (Celine) Lin, Jason Cong, Yizhou Sun Shichang Zhang
2026
≈ 70%
Neural Cellular Automata Can Respond to Signals
James Stovold
2024
≈ 70%
What Neuroscience Can Teach AI About Learning in Continuously Changing Environments
Bruno Averbeck, Georgia Koppe Daniel Durstewitz
2025
≈ 70%
Emergence of a phonological bias in ChatGPT
Juan Manuel Toro
2026
≈ 69%
Probability Bracket Notation: Multivariable Systems and Static Bayesian Networks
Xing M. Wang
2026
≈ 69%
Simple Mechanisms for Representing, Indexing and Manipulating Concepts
Raghu Meka, Rina Panigrahy, Kulin Shah Yuanzhi Li
2026
≈ 69%
A graph neural network-based model with Out-of-Distribution Robustness for enhancing Antiretroviral Therapy Outcome Prediction for HIV-1
Federico Siciliano, Valerio Guarrasi, Anne-Mieke Vandamme, Valeria Ghisetti, Anders S\"onnerborg, Maurizio Zazzi, Fabrizio Silvestri, Laura Palagi Giulia Di Teodoro
2026
≈ 69%
Emergence and Causality in Complex Systems: A Survey on Causal Emergence and Related Quantitative Studies
in corpus
2023
≈ 69%
There Will Be a Scientific Theory of Deep Learning
Daniel Kunin, Alexander Atanasov, Enric Boix-Adser\`a, Blake Bordelon, Jeremy Cohen, Nikhil Ghosh, Florentin Guth, Arthur Jacot, Mason Kamb, Dhruva Karkada, Eric J. Michaud, Berkan Ottlik, Joseph Turnbull Jamie Simon
2026
≈ 69%
Neural networks leverage nominally quantum and post-quantum representations
Paul M. Riechers and Thomas J. Elliott and Adam S. Shai
2025
≈ 69%
Consciousness in Artificial Intelligence: Insights from the Science of Consciousness
in corpus
2023
≈ 68%
Arithmetic in the Wild: Llama uses Base-10 Addition to Reason About Cyclic Concepts
in corpus
2026
≈ 67%
Learning without neurons in physical systems
in corpus
2022
≈ 67%
Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations
in corpus
≈ 67%
The Platonic Representation Hypothesis
in corpus
2024
≈ 67%
Taking AI Welfare Seriously
in corpus
2024
≈ 66%
The World Inside Neural Networks
in corpus
2026
≈ 66%
AI as a Buddhist Self-Overcoming Technique in Another Medium
in corpus
2025
≈ 66%
Addressing divergent representations from causal interventions on neural networks
in corpus
2025
≈ 65%
Why Learning Requires Feeling
in corpus
2026
≈ 65%
Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations
in corpus
2023
≈ 64%
Zoom In: An Introduction to Circuits
in corpus
2020
≈ 64%
A Mathematical Framework for Transformer Circuits
in corpus
2021
≈ 64%

Similar preprints — Semantic Scholar

Cited by (4)

Addressing divergent representations from causal interventions on neural networks
Causal intervention methods central to mechanistic interpretability—including activation patching, mean-difference vector patching, Sparse Autoencoders, and Distributed Alignment Search (DAS)—systemat
Model Alignment Search
Model Alignment Search (MAS) establishes bidirectional causal similarity between neural networks by learning a per-model orthogonal rotation matrix that isolates behaviorally relevant subspaces and us
Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior
Manifold steering — intervening on model activations along paths constrained to lie on a learned activation manifold M_h rather than along Euclidean linear directions — produces behavioral trajectorie
Arithmetic in the Wild: Llama uses Base-10 Addition to Reason About Cyclic Concepts
Llama-3.1-8B solves cyclic arithmetic (e.g., "what month is six months after August?") not by performing modular addition in the period of the cyclic concept (12 for months, 7 for days of the week) as