Topological constraints on self-organisation in locally interacting systems

ByFrancesco Sacco·Dalton A R Sakthivadivel·Michael Levin ⓘAllen Discovery Center At Tufts University, Allen Discovery Center, Tufts University + 10 more

DOI 10.48550/arxiv.2501.13188 arXiv 2501.13188 OpenAlex W4406785232

Multiscale agency & collective intelligence Bioelectric cognition & collective individuality fitness pressure stochastic thermodynamics autoregressive modeling formal thought disorder causally-masked attention Glauber dynamics no-go theorem for long-range order problem space navigation spontaneous magnetisation stored patterns thermodynamic limit

TL;DR

Topology of local interactions is the decisive factor determining whether a system can sustain long-range order, and decoder-only transformer architectures are provably unable to maintain such order for arbitrarily long output sequences. By generalizing the Landau–Lifshitz scaling argument and Peierls' domain-wall counting to a broad universality class via a Topological Equivalence Theorem (Theorem 1), the paper shows that any local Hamiltonian on a graph shares asymptotically equivalent free energy with a nearest-neighbour Ising model on the same combinatorial structure—meaning the existence or non-existence of a phase transition reduces entirely to graph topology. Three model systems are analyzed: the one-dimensional windowed Potts model, AR(ω) autoregressive models (Corollary 2), and hierarchical clique networks. For the Potts chain, domain-wall entropy scales as log(L−1) while energy is bounded by ωE^max, forcing ΔF negative for sufficiently large sequence length L at any nonzero temperature. Transformer attention with a finite context window ω maps directly onto the AR(ω) framework (Proposition 2 via Theorem 3), inheriting the same no-go result. Conversely, biological systems organized as nested cliques—cells forming tissues, tissues forming organs—admit a non-empty critical temperature range of hierarchical order (Proposition 3), achievable when the clique count ℓ and size nmax satisfy ℓ/r > nmax^nmax/e. The paper argues this constitutes a principled thermodynamic explanation for why autoregressive LLMs exhibit coherence failures on long tasks while multicellular organisms maintain large-scale morphogenetic order, and proposes that stigmergy and embodiment function as evolutionary responses to this topological no-go constraint.

What to take away

1. Theorem 1 (the Topological Equivalence Theorem) proves that all local Hamiltonians on graphs with the same combinatorial structure have asymptotically equivalent free energies, reducing the question of phase transitions to nearest-neighbour Ising models on the same graph.
2. For a one-dimensional windowed Potts chain of length L with context window ω and maximum window energy E^max, the change in free energy satisfies ΔF ≤ ωE^max − T log[(m−1)(L−1)], which becomes negative for sufficiently large L at any T > 0, proving no ordered phase exists (Theorem 2).
3. Any AR(ω) autoregressive model is formally equivalent to a one-dimensional local Hamiltonian with window length ω (Theorem 3), so by Theorem 2 no autoregressive model can converge to a single stored pattern for finite inverse temperature β (Corollary 2).
4. Causally-masked attention in a decoder-only transformer with finite context length ω maps exactly onto an AR(ω) model via the modern Hopfield network formulation, making Proposition 2 a direct corollary: such architectures have no ordered phase.
5. Hierarchical clique graphs admit a non-empty critical temperature range of local order with global disorder when ℓ/r > nmax^nmax/e, where ℓ is the number of cliques, r the number flipped, and nmax the largest clique size (Proposition 3).
6. Within a clique of ni spins, uniform magnetisation is maintained whenever the coupling J satisfies 2J > T log ni, providing an explicit parameter condition linking coupling strength, temperature, and clique size to coherent local order (Theorem 4).
7. Lemma 1 shows that the asymptotics of any local Hamiltonian depend only on perimeter length P, not on energy levels E^min/E^max or window sizes ω1/ω2, since both bounds scale as O(P).
8. An open question raised is whether glassy systems—including Hopfield networks and Edwards–Anderson spin glasses—admit analogous topological constraints on phases with frozen disorder, which the authors flag as forthcoming work.
9. To replicate the hierarchical clique analysis, a researcher should construct a graph G with ℓ independent cliques of sizes n1,…,nℓ (all ni > 2), assume uniform coupling J, compute ΔF = Σ2Jni_γ − T log C(ℓ,r), and verify the temperature inequality 2J/log(nmax) > T > 2Jrnmax/(r log ℓ − r log r + r).
10. The paper hypothesises that stigmergy and embodiment in biological systems—and cognitive behavioural therapy interventions for formal thought disorder in schizophrenia—are functional analogues that overcome topological ordering limitations by extending the effective interaction graph through environmental coupling.

Peer brief — for seminar discussion

The paper derives necessary topological conditions for the existence of an ordered phase in systems with local interactions, unifying spin-model physics, autoregressive language models, and hierarchical biological networks under a single thermodynamic framework. Starting from the Landau–Lifshitz domain-wall scaling argument for the 1D Ising model and Peierls' 2D counterargument, the paper introduces the Topological Equivalence Theorem (Theorem 1), which establishes that any local Hamiltonian—defined as a sum of windowed Hamiltonians of finite interaction range ω—has asymptotically equivalent free energy to a nearest-neighbour Ising model on the same graph. This means the capacity for self-organisation depends only on the combinatorial topology of the interaction graph, not on the number of stored patterns m, the specific energy levels E^max or E^min, or the window size ω. Three model systems are worked through: the one-dimensional windowed Potts model (Theorem 2), AR(ω) autoregressive models (Theorem 3, Corollary 2), and hierarchical clique networks (Theorem 4, Proposition 3). The load-bearing finding is that because decoder-only transformer architectures perform autoregression over a finite context window ω, they map directly onto a 1D local Hamiltonian, and therefore provably cannot sustain long-range order—a formal thermodynamic explanation for the empirically observed coherence failures in long-task performance documented in benchmarks such as Kwa et al. 2025 (arXiv:2503.14499). Conversely, biological multiscale systems organised as nested cliques admit a non-empty critical temperature range satisfying 2J/log(nmax) > T > 2Jrnmax/(r log ℓ − r log r + r), enabling local order within cliques while global disorder persists across them, which the paper argues recapitulates the tissue-level coherence and organ-level heterogeneity seen in morphogenesis. The implied prediction is that any architecture capable of long-range coherence must have an interaction topology with higher-dimensional combinatorial structure than a 1D chain—specifically, the kind of hierarchical clique organisation prevalent in biological neural and cellular systems. An alternative method the paper could have used is direct replica-method computation of quenched disorder in spin-glass formulations (as in the Edwards–Anderson model or Hopfield network saturation analysis of Amit, Gutfreund, and Sompolinsky 1987), which would access glassy phases the current Landau-theory framework explicitly sets aside. The most contestable aspect is the equilibrium assumption underlying Theorem 1: the Bogoliubov inequality bound F ≤ F0 + ⟨H − H0⟩0 is tight only near equilibrium, and real transformer inference is a non-equilibrium sampling process with finite compute budgets, anisotropic loss landscapes, and temperature scheduling—all of which could alter the scaling regime in ways the paper acknowledges but does not resolve. A critical reader would also push back on the treatment of attention as a fixed-window AR(ω) model: modern architectures use dynamic retrieval, sparse attention, and retrieval-augmented generation that effectively extend or restructure the interaction graph, potentially lifting the 1D topology constraint in ways that are not captured by the finite-ω formalism used here.

Methods (2)

autoregressive modeling
Statistical technique where outputs are regressed on previous values; used in language generation
causally-masked attention
Attention mechanism with causal mask limiting each token's view to previous tokens; used in decoder-only transformers

Frameworks (1)

stochastic thermodynamics
Framework for non-equilibrium free energies; mentioned as future direction to generalise results

Findings (9)

For a graph with independent cliques, individual cliques may flip magnetisation while remaining uniformly magnetised if intra-clique coupling > (T/2) log n_i (Theorem 4)
Condition for hierarchical order with locally coherent but globally varying phases
In hierarchical systems with independent cliques, there exist parameter regimes where individual cliques maintain uniform magnetisation while others flip.
Shows how hierarchical topology enables local order within global flexibility; explains biological multiscale organization
For one-dimensional local Hamiltonian with m>1 stored patterns at non-zero temperature, domain wall formation is thermodynamically favourable (Theorem 2)
No ordered phase in 1D with multiple stored patterns
All local Hamiltonians on lattices with the same combinatorial structure have asymptotically equivalent free energies (Theorem 1)
Topological equivalence theorem for local Hamiltonians
At thermal equilibrium, ability to converge to an ordered phase is independent of energy levels and window sizes (Lemma 1)
Scaling argument depends only on perimeter, not details of energy magnitudes or window length
Autoregressive model unable to converge to a single stored pattern for any finite β (Corollary 2)
Consequence of Theorem 3 and 1D no-order result
A unique local Hamiltonian with window length ω can be associated to any AR(ω) model (Theorem 3)
Mapping autoregressive models to spin systems
There exists a non-empty critical temperature range of hierarchical behaviour (Proposition 3)
Proof that the conditions of Theorem 4 are realisable in a range of temperatures
Causally-masked attention in a decoder-only model has no ordered phase (Proposition 2)
Application to transformer language models

Claims (12)

The inability for autoregressive large language models to maintain states of long-range order resembles tangential speech or derailment in formal thought disorder.
Analogy between LLM incoherence and schizophrenia symptoms
The results generalise readily to non-equilibrium systems where scaling relationships remain similar (e.g., dynamic or localised scaling).
Claim about broader applicability of the scaling argument
All intelligence is collective intelligence, in the sense that it is made of parts which must align with respect to system-level goals.
Opening axiom of the paper, a fundamental interpretive stance
Decoder-only transformer architectures are fundamentally limited in generating long, coherent sequences due to lack of ordered phase.
Interpretation of Proposition 2 as a fundamental limitation on LLMs
Practical context length limitations in language models lead to forgetting outside the window, constraining coherence over time.
Claim about engineering constraint reinforcing the theoretical no-order result
The difference between simple language models and multicellular organisms goes beyond the substrate of intelligence considered.
Claim that topologies, not material substrates, account for differing organisational abilities
Topology is the critical factor differentiating the self-organising capabilities of biological systems and language models.
Central interpretive claim of the paper: the ability to maintain long-range order is determined by interaction topology, not substrate.
Hierarchical structures in biological systems enable local order while globally disordered, explaining complex patterning.
Claim that multiscale organisation produces complex patterns via clique-based local coherence
Self-organisation can be viewed as a form of autopoietic cognition navigating problem spaces toward target morphologies.
Linking self-organisation to cognition and navigation of configuration space
Hierarchical structure in interaction topology enables complex multiscale patterns that cannot exist in flat networks.
Explains why biological systems achieve organization across scales while language models struggle; grounds in free energy scaling

Hypotheses (2)

We hypothesise this explains why stigmergy and other forms of extracellular signalling arise in biological systems, which is known to enhance the ability for a collective system to order itself.
Hypothesis connecting fitness pressure from topological constraints to the evolutionary origin of stigmergy
We hypothesise that an embodied world model, extending the system in space and time by its interactions with an environment, can be leveraged to maintain coherence.
Proposed solution to the topological limitation, linking embodiment to coherence

Questions (1)

What is the functional distinction between simple language models and multicellular organisms, and can generative AI harness that property to achieve long-range order?
Core motivating question; drives investigation of topological differences between biological and artificial systems

Original abstract (expand)

All intelligence is collective intelligence, in the sense that it is made of parts which must align with respect to system-level goals. Understanding the dynamics which facilitate or limit navigation of problem spaces by aligned parts thus impacts many fields ranging across life sciences and engineering. To that end, consider a system on the vertices of a planar graph, with pairwise interactions prescribed by the edges of the graph. Such systems can sometimes exhibit long-range order, distinguishing one phase of macroscopic behaviour from another. In networks of interacting systems we may view spontaneous ordering as a form of self-organisation, modelling neural and basal forms of cognition. Here, we discuss necessary conditions on the topology of the graph for an ordered phase to exist, with an eye towards finding constraints on the ability of a system with local interactions to maintain an ordered target state. By studying the scaling of free energy under the formation of domain walls in three model systems -- the Potts model, autoregressive models, and hierarchical networks -- we show how the combinatorics of interactions on a graph prevent or allow spontaneous ordering. As an application we are able to analyse why multiscale systems like those prevalent in biology are capable of organising into complex patterns, whereas rudimentary language models are challenged by long sequences of outputs.

Related work— refs + corpus + external arXiv

Cited / in-corpus / arXiv badges show which signals surfaced each row. Multi-source rows weighted higher.

Topological constraints on self-organization in locally interacting systems
in corpus
2026
≈ 94%
The computational boundary of a 'self': developmental bioelectricity drives multicellularity and scale-free cognition
cited
in corpus
2019
≈ 81%
Living Things Are Not (20th Century) Machines: Updating Mechanism Metaphors in Light of the Modern Science of Machine Behavior
cited
in corpus
2021
≈ 79%
Technological Approach to Mind Everywhere: An Experimentally-Grounded Framework for Understanding Diverse Bodies and Minds
cited
in corpus
2022
≈ 79%
Biological hierarchies emerged from natural characteristics of number theory
Shun Adachi
2026
≈ 83%
Self-Evidencing Through Hierarchical Gradient Decomposition: A Dissipative System That Maintains Non-Equilibrium Steady-State by Minimizing Variational Free Energy
Michael James McCulloch
2025
≈ 83%
Response theory and phase transitions for the thermodynamic limit of interacting identical systems
cited
2020
≈ 82%
Life as we know it
in corpus
2013
≈ 82%
Free Energy and Network Structure: Breaking Scale-Free Behaviour Through Information Processing Constraints
Zhan Chen Peter R Williams
2025
≈ 81%
Sentient Self-Organization: Minimal dynamics and circular causality
Biswa Sengupta and Karl Friston
2017
≈ 81%
Self-Organization, Evolutionary Entropy and Directionality Theory
Lloyd A. Demetrius
2023
≈ 81%
Knitting a Markov blanket is hard when you are out-of-equilibrium: two examples in canonical nonequilibrium models
\'Angel Poc-L\'opez, Conor Heins, Christopher L. Buckley Miguel Aguilera
2022
≈ 81%
A Selection Criterion for Patterns in Reaction-Diffusion Systems
Tatiana T. Marquez-Lago and Pablo Padilla
2014
≈ 81%
The scaling of goals from cellular to anatomical homeostasis: an evolutionary simulation, experiment and analysis
cited
2023
≈ 81%
Mathematical Models of Evolution and Replicator Systems Dynamics. Chapter 1: Introduction to Replicator Systems
S. Drozhzhin, and T. Yakushkina A.S. Bratus
2026
≈ 80%
Markov Blankets in the Brain
Maxwell Ramstead, Laura Convertino, Anjali Bhat, Karl Friston, Thomas Parr Ines Hipolito
2020
≈ 80%
Bounded rationality for relaxing best response and mutual consistency: The Quantal Hierarchy model of decision-making
Mikhail Prokopenko Benjamin Patrick Evans
2023
≈ 80%
Non-Archimedean Models of Morphogenesis
W. A. Z\'u\~niga-Galindo
2021
≈ 80%
Internalized Morphogenesis: A Self-Organizing Model for Growth, Replication, and Regeneration via Local Token Exchange in Modular Systems
Takeshi Ishida
2026
≈ 80%
Swarms, Phase Transitions, and Collective Intelligence
LANL and Santa Fe Institute) Mark M. Millonas (Center for Nonlinear Studies and Theoretical Division
2008
≈ 80%
Strong regulatory graphs
Patric Gustafsson and Ion Petre
2026
≈ 80%
Minimal branching and fusion morphogenesis approaches biological multi-objective optimality
Maxime Lucas and Corentin Bisot and Giovanni Petri and St\'ephane Declerck and Timoteo Carletti
2026
≈ 80%
Learning without neurons in physical systems
in corpus
2022
≈ 80%
Information, Processes and Games
in corpus
≈ 80%
There is no self-evidence: A physics of emptiness realisation
in corpus
2026
≈ 79%
Stochastic thermodynamics, fluctuation theorems and molecular machines
cited
2012
≈ 79%
Stigmergy, Self-Organization, and Sorting in Collective Robotics
cited
1999
≈ 79%
Darwin's agential materials: evolutionary implications of multiscale competency in developmental biology
in corpus
2023
≈ 79%
Every Good Regulator of a System Must Be a Model of That System
in corpus
1991
≈ 79%
Classical sorting algorithms as a model of morphogenesis: Self-sorting arrays reveal unexpected competencies in a minimal model of basal intelligence
cited
2024
≈ 79%

+21 more