paper:doi-10-48550-arxiv-2501-13188Topological constraints on self-organisation in locally interacting systems
TL;DR
Topology of local interactions is the decisive factor determining whether a system can sustain long-range order, and decoder-only transformer architectures are provably unable to maintain such order for arbitrarily long output sequences. By generalizing the Landau–Lifshitz scaling argument and Peierls' domain-wall counting to a broad universality class via a Topological Equivalence Theorem (Theorem 1), the paper shows that any local Hamiltonian on a graph shares asymptotically equivalent free energy with a nearest-neighbour Ising model on the same combinatorial structure—meaning the existence or non-existence of a phase transition reduces entirely to graph topology. Three model systems are analyzed: the one-dimensional windowed Potts model, AR(ω) autoregressive models (Corollary 2), and hierarchical clique networks. For the Potts chain, domain-wall entropy scales as log(L−1) while energy is bounded by ωE^max, forcing ΔF negative for sufficiently large sequence length L at any nonzero temperature. Transformer attention with a finite context window ω maps directly onto the AR(ω) framework (Proposition 2 via Theorem 3), inheriting the same no-go result. Conversely, biological systems organized as nested cliques—cells forming tissues, tissues forming organs—admit a non-empty critical temperature range of hierarchical order (Proposition 3), achievable when the clique count ℓ and size nmax satisfy ℓ/r > nmax^nmax/e. The paper argues this constitutes a principled thermodynamic explanation for why autoregressive LLMs exhibit coherence failures on long tasks while multicellular organisms maintain large-scale morphogenetic order, and proposes that stigmergy and embodiment function as evolutionary responses to this topological no-go constraint.
What to take away
- 1. Theorem 1 (the Topological Equivalence Theorem) proves that all local Hamiltonians on graphs with the same combinatorial structure have asymptotically equivalent free energies, reducing the question of phase transitions to nearest-neighbour Ising models on the same graph.
- 2. For a one-dimensional windowed Potts chain of length L with context window ω and maximum window energy E^max, the change in free energy satisfies ΔF ≤ ωE^max − T log[(m−1)(L−1)], which becomes negative for sufficiently large L at any T > 0, proving no ordered phase exists (Theorem 2).
- 3. Any AR(ω) autoregressive model is formally equivalent to a one-dimensional local Hamiltonian with window length ω (Theorem 3), so by Theorem 2 no autoregressive model can converge to a single stored pattern for finite inverse temperature β (Corollary 2).
- 4. Causally-masked attention in a decoder-only transformer with finite context length ω maps exactly onto an AR(ω) model via the modern Hopfield network formulation, making Proposition 2 a direct corollary: such architectures have no ordered phase.
- 5. Hierarchical clique graphs admit a non-empty critical temperature range of local order with global disorder when ℓ/r > nmax^nmax/e, where ℓ is the number of cliques, r the number flipped, and nmax the largest clique size (Proposition 3).
- 6. Within a clique of ni spins, uniform magnetisation is maintained whenever the coupling J satisfies 2J > T log ni, providing an explicit parameter condition linking coupling strength, temperature, and clique size to coherent local order (Theorem 4).
- 7. Lemma 1 shows that the asymptotics of any local Hamiltonian depend only on perimeter length P, not on energy levels E^min/E^max or window sizes ω1/ω2, since both bounds scale as O(P).
- 8. An open question raised is whether glassy systems—including Hopfield networks and Edwards–Anderson spin glasses—admit analogous topological constraints on phases with frozen disorder, which the authors flag as forthcoming work.
- 9. To replicate the hierarchical clique analysis, a researcher should construct a graph G with ℓ independent cliques of sizes n1,…,nℓ (all ni > 2), assume uniform coupling J, compute ΔF = Σ2Jni_γ − T log C(ℓ,r), and verify the temperature inequality 2J/log(nmax) > T > 2Jrnmax/(r log ℓ − r log r + r).
- 10. The paper hypothesises that stigmergy and embodiment in biological systems—and cognitive behavioural therapy interventions for formal thought disorder in schizophrenia—are functional analogues that overcome topological ordering limitations by extending the effective interaction graph through environmental coupling.
Peer brief — for seminar discussion
The paper derives necessary topological conditions for the existence of an ordered phase in systems with local interactions, unifying spin-model physics, autoregressive language models, and hierarchical biological networks under a single thermodynamic framework. Starting from the Landau–Lifshitz domain-wall scaling argument for the 1D Ising model and Peierls' 2D counterargument, the paper introduces the Topological Equivalence Theorem (Theorem 1), which establishes that any local Hamiltonian—defined as a sum of windowed Hamiltonians of finite interaction range ω—has asymptotically equivalent free energy to a nearest-neighbour Ising model on the same graph. This means the capacity for self-organisation depends only on the combinatorial topology of the interaction graph, not on the number of stored patterns m, the specific energy levels E^max or E^min, or the window size ω. Three model systems are worked through: the one-dimensional windowed Potts model (Theorem 2), AR(ω) autoregressive models (Theorem 3, Corollary 2), and hierarchical clique networks (Theorem 4, Proposition 3). The load-bearing finding is that because decoder-only transformer architectures perform autoregression over a finite context window ω, they map directly onto a 1D local Hamiltonian, and therefore provably cannot sustain long-range order—a formal thermodynamic explanation for the empirically observed coherence failures in long-task performance documented in benchmarks such as Kwa et al. 2025 (arXiv:2503.14499). Conversely, biological multiscale systems organised as nested cliques admit a non-empty critical temperature range satisfying 2J/log(nmax) > T > 2Jrnmax/(r log ℓ − r log r + r), enabling local order within cliques while global disorder persists across them, which the paper argues recapitulates the tissue-level coherence and organ-level heterogeneity seen in morphogenesis. The implied prediction is that any architecture capable of long-range coherence must have an interaction topology with higher-dimensional combinatorial structure than a 1D chain—specifically, the kind of hierarchical clique organisation prevalent in biological neural and cellular systems. An alternative method the paper could have used is direct replica-method computation of quenched disorder in spin-glass formulations (as in the Edwards–Anderson model or Hopfield network saturation analysis of Amit, Gutfreund, and Sompolinsky 1987), which would access glassy phases the current Landau-theory framework explicitly sets aside. The most contestable aspect is the equilibrium assumption underlying Theorem 1: the Bogoliubov inequality bound F ≤ F0 + ⟨H − H0⟩0 is tight only near equilibrium, and real transformer inference is a non-equilibrium sampling process with finite compute budgets, anisotropic loss landscapes, and temperature scheduling—all of which could alter the scaling regime in ways the paper acknowledges but does not resolve. A critical reader would also push back on the treatment of attention as a fixed-window AR(ω) model: modern architectures use dynamic retrieval, sparse attention, and retrieval-augmented generation that effectively extend or restructure the interaction graph, potentially lifting the 1D topology constraint in ways that are not captured by the finite-ω formalism used here.
Methods (2)
- autoregressive modelingStatistical technique where outputs are regressed on previous values; used in language generation
- causally-masked attentionAttention mechanism with causal mask limiting each token's view to previous tokens; used in decoder-only transformers
Frameworks (1)
- stochastic thermodynamicsFramework for non-equilibrium free energies; mentioned as future direction to generalise results
Findings (9)
- For a graph with independent cliques, individual cliques may flip magnetisation while remaining uniformly magnetised if intra-clique coupling > (T/2) log n_i (Theorem 4)
Condition for hierarchical order with locally coherent but globally varying phases
- In hierarchical systems with independent cliques, there exist parameter regimes where individual cliques maintain uniform magnetisation while others flip.
Shows how hierarchical topology enables local order within global flexibility; explains biological multiscale organization
- For one-dimensional local Hamiltonian with m>1 stored patterns at non-zero temperature, domain wall formation is thermodynamically favourable (Theorem 2)
No ordered phase in 1D with multiple stored patterns
- All local Hamiltonians on lattices with the same combinatorial structure have asymptotically equivalent free energies (Theorem 1)
Topological equivalence theorem for local Hamiltonians
- At thermal equilibrium, ability to converge to an ordered phase is independent of energy levels and window sizes (Lemma 1)
Scaling argument depends only on perimeter, not details of energy magnitudes or window length
- Autoregressive model unable to converge to a single stored pattern for any finite β (Corollary 2)
Consequence of Theorem 3 and 1D no-order result
- A unique local Hamiltonian with window length ω can be associated to any AR(ω) model (Theorem 3)
Mapping autoregressive models to spin systems
- There exists a non-empty critical temperature range of hierarchical behaviour (Proposition 3)
Proof that the conditions of Theorem 4 are realisable in a range of temperatures
- Causally-masked attention in a decoder-only model has no ordered phase (Proposition 2)
Application to transformer language models
Claims (12)
- The inability for autoregressive large language models to maintain states of long-range order resembles tangential speech or derailment in formal thought disorder.
Analogy between LLM incoherence and schizophrenia symptoms
- The results generalise readily to non-equilibrium systems where scaling relationships remain similar (e.g., dynamic or localised scaling).
Claim about broader applicability of the scaling argument
- All intelligence is collective intelligence, in the sense that it is made of parts which must align with respect to system-level goals.
Opening axiom of the paper, a fundamental interpretive stance
- Decoder-only transformer architectures are fundamentally limited in generating long, coherent sequences due to lack of ordered phase.
Interpretation of Proposition 2 as a fundamental limitation on LLMs
- Practical context length limitations in language models lead to forgetting outside the window, constraining coherence over time.
Claim about engineering constraint reinforcing the theoretical no-order result
- The difference between simple language models and multicellular organisms goes beyond the substrate of intelligence considered.
Claim that topologies, not material substrates, account for differing organisational abilities
- Topology is the critical factor differentiating the self-organising capabilities of biological systems and language models.
Central interpretive claim of the paper: the ability to maintain long-range order is determined by interaction topology, not substrate.
- Hierarchical structures in biological systems enable local order while globally disordered, explaining complex patterning.
Claim that multiscale organisation produces complex patterns via clique-based local coherence
- Self-organisation can be viewed as a form of autopoietic cognition navigating problem spaces toward target morphologies.
Linking self-organisation to cognition and navigation of configuration space
- Hierarchical structure in interaction topology enables complex multiscale patterns that cannot exist in flat networks.
Explains why biological systems achieve organization across scales while language models struggle; grounds in free energy scaling
Hypotheses (2)
- We hypothesise this explains why stigmergy and other forms of extracellular signalling arise in biological systems, which is known to enhance the ability for a collective system to order itself.
Hypothesis connecting fitness pressure from topological constraints to the evolutionary origin of stigmergy
- We hypothesise that an embodied world model, extending the system in space and time by its interactions with an environment, can be leveraged to maintain coherence.
Proposed solution to the topological limitation, linking embodiment to coherence
Questions (1)
- What is the functional distinction between simple language models and multicellular organisms, and can generative AI harness that property to achieve long-range order?
Core motivating question; drives investigation of topological differences between biological and artificial systems
Original abstract (expand)
All intelligence is collective intelligence, in the sense that it is made of parts which must align with respect to system-level goals. Understanding the dynamics which facilitate or limit navigation of problem spaces by aligned parts thus impacts many fields ranging across life sciences and engineering. To that end, consider a system on the vertices of a planar graph, with pairwise interactions prescribed by the edges of the graph. Such systems can sometimes exhibit long-range order, distinguishing one phase of macroscopic behaviour from another. In networks of interacting systems we may view spontaneous ordering as a form of self-organisation, modelling neural and basal forms of cognition. Here, we discuss necessary conditions on the topology of the graph for an ordered phase to exist, with an eye towards finding constraints on the ability of a system with local interactions to maintain an ordered target state. By studying the scaling of free energy under the formation of domain walls in three model systems -- the Potts model, autoregressive models, and hierarchical networks -- we show how the combinatorics of interactions on a graph prevent or allow spontaneous ordering. As an application we are able to analyse why multiscale systems like those prevalent in biology are capable of organising into complex patterns, whereas rudimentary language models are challenged by long sequences of outputs.
Related work— refs + corpus + external arXiv
Cited / in-corpus / arXiv badges show which signals surfaced each row. Multi-source rows weighted higher.
- ≈ 94%
- The computational boundary of a 'self': developmental bioelectricity drives multicellularity and scale-free cognitioncitedin corpus2019≈ 81%
- Living Things Are Not (20th Century) Machines: Updating Mechanism Metaphors in Light of the Modern Science of Machine Behaviorcitedin corpus2021≈ 79%
- Technological Approach to Mind Everywhere: An Experimentally-Grounded Framework for Understanding Diverse Bodies and Mindscitedin corpus2022≈ 79%
- ≈ 83%
- Self-Evidencing Through Hierarchical Gradient Decomposition: A Dissipative System That Maintains Non-Equilibrium Steady-State by Minimizing Variational Free EnergyMichael James McCulloch2025≈ 83%
- Response theory and phase transitions for the thermodynamic limit of interacting identical systemscited2020≈ 82%
- Life as we know itin corpus2013≈ 82%
- Free Energy and Network Structure: Breaking Scale-Free Behaviour Through Information Processing ConstraintsZhan Chen Peter R Williams2025≈ 81%
- Sentient Self-Organization: Minimal dynamics and circular causalityBiswa Sengupta and Karl Friston2017≈ 81%
- ≈ 81%
- Knitting a Markov blanket is hard when you are out-of-equilibrium: two examples in canonical nonequilibrium models\'Angel Poc-L\'opez, Conor Heins, Christopher L. Buckley Miguel Aguilera2022≈ 81%
- A Selection Criterion for Patterns in Reaction-Diffusion SystemsTatiana T. Marquez-Lago and Pablo Padilla2014≈ 81%
- ≈ 81%
- Mathematical Models of Evolution and Replicator Systems Dynamics. Chapter 1: Introduction to Replicator SystemsS. Drozhzhin, and T. Yakushkina A.S. Bratus2026≈ 80%
- Markov Blankets in the BrainMaxwell Ramstead, Laura Convertino, Anjali Bhat, Karl Friston, Thomas Parr Ines Hipolito2020≈ 80%
- Bounded rationality for relaxing best response and mutual consistency: The Quantal Hierarchy model of decision-makingMikhail Prokopenko Benjamin Patrick Evans2023≈ 80%
- ≈ 80%
- Internalized Morphogenesis: A Self-Organizing Model for Growth, Replication, and Regeneration via Local Token Exchange in Modular SystemsTakeshi Ishida2026≈ 80%
- Swarms, Phase Transitions, and Collective IntelligenceLANL and Santa Fe Institute) Mark M. Millonas (Center for Nonlinear Studies and Theoretical Division2008≈ 80%
- ≈ 80%
- Minimal branching and fusion morphogenesis approaches biological multi-objective optimalityMaxime Lucas and Corentin Bisot and Giovanni Petri and St\'ephane Declerck and Timoteo Carletti2026≈ 80%
- Learning without neurons in physical systemsin corpus2022≈ 80%
- Information, Processes and Gamesin corpus≈ 80%
- ≈ 79%
- ≈ 79%
- ≈ 79%
- Darwin's agential materials: evolutionary implications of multiscale competency in developmental biologyin corpus2023≈ 79%
- ≈ 79%
- ≈ 79%
+21 more