paper
active
2025
paper:doi-10-48550-arxiv-2501-13188

Topological constraints on self-organisation in locally interacting systems

TL;DR

Topology of local interactions is the decisive factor determining whether a system can sustain long-range order, and decoder-only transformer architectures are provably unable to maintain such order for arbitrarily long output sequences. By generalizing the Landau–Lifshitz scaling argument and Peierls' domain-wall counting to a broad universality class via a Topological Equivalence Theorem (Theorem 1), the paper shows that any local Hamiltonian on a graph shares asymptotically equivalent free energy with a nearest-neighbour Ising model on the same combinatorial structure—meaning the existence or non-existence of a phase transition reduces entirely to graph topology. Three model systems are analyzed: the one-dimensional windowed Potts model, AR(ω) autoregressive models (Corollary 2), and hierarchical clique networks. For the Potts chain, domain-wall entropy scales as log(L−1) while energy is bounded by ωE^max, forcing ΔF negative for sufficiently large sequence length L at any nonzero temperature. Transformer attention with a finite context window ω maps directly onto the AR(ω) framework (Proposition 2 via Theorem 3), inheriting the same no-go result. Conversely, biological systems organized as nested cliques—cells forming tissues, tissues forming organs—admit a non-empty critical temperature range of hierarchical order (Proposition 3), achievable when the clique count ℓ and size nmax satisfy ℓ/r > nmax^nmax/e. The paper argues this constitutes a principled thermodynamic explanation for why autoregressive LLMs exhibit coherence failures on long tasks while multicellular organisms maintain large-scale morphogenetic order, and proposes that stigmergy and embodiment function as evolutionary responses to this topological no-go constraint.

What to take away

  1. 1. Theorem 1 (the Topological Equivalence Theorem) proves that all local Hamiltonians on graphs with the same combinatorial structure have asymptotically equivalent free energies, reducing the question of phase transitions to nearest-neighbour Ising models on the same graph.
  2. 2. For a one-dimensional windowed Potts chain of length L with context window ω and maximum window energy E^max, the change in free energy satisfies ΔF ≤ ωE^max − T log[(m−1)(L−1)], which becomes negative for sufficiently large L at any T > 0, proving no ordered phase exists (Theorem 2).
  3. 3. Any AR(ω) autoregressive model is formally equivalent to a one-dimensional local Hamiltonian with window length ω (Theorem 3), so by Theorem 2 no autoregressive model can converge to a single stored pattern for finite inverse temperature β (Corollary 2).
  4. 4. Causally-masked attention in a decoder-only transformer with finite context length ω maps exactly onto an AR(ω) model via the modern Hopfield network formulation, making Proposition 2 a direct corollary: such architectures have no ordered phase.
  5. 5. Hierarchical clique graphs admit a non-empty critical temperature range of local order with global disorder when ℓ/r > nmax^nmax/e, where ℓ is the number of cliques, r the number flipped, and nmax the largest clique size (Proposition 3).
  6. 6. Within a clique of ni spins, uniform magnetisation is maintained whenever the coupling J satisfies 2J > T log ni, providing an explicit parameter condition linking coupling strength, temperature, and clique size to coherent local order (Theorem 4).
  7. 7. Lemma 1 shows that the asymptotics of any local Hamiltonian depend only on perimeter length P, not on energy levels E^min/E^max or window sizes ω1/ω2, since both bounds scale as O(P).
  8. 8. An open question raised is whether glassy systems—including Hopfield networks and Edwards–Anderson spin glasses—admit analogous topological constraints on phases with frozen disorder, which the authors flag as forthcoming work.
  9. 9. To replicate the hierarchical clique analysis, a researcher should construct a graph G with ℓ independent cliques of sizes n1,…,nℓ (all ni > 2), assume uniform coupling J, compute ΔF = Σ2Jni_γ − T log C(ℓ,r), and verify the temperature inequality 2J/log(nmax) > T > 2Jrnmax/(r log ℓ − r log r + r).
  10. 10. The paper hypothesises that stigmergy and embodiment in biological systems—and cognitive behavioural therapy interventions for formal thought disorder in schizophrenia—are functional analogues that overcome topological ordering limitations by extending the effective interaction graph through environmental coupling.

Peer brief — for seminar discussion

The paper derives necessary topological conditions for the existence of an ordered phase in systems with local interactions, unifying spin-model physics, autoregressive language models, and hierarchical biological networks under a single thermodynamic framework. Starting from the Landau–Lifshitz domain-wall scaling argument for the 1D Ising model and Peierls' 2D counterargument, the paper introduces the Topological Equivalence Theorem (Theorem 1), which establishes that any local Hamiltonian—defined as a sum of windowed Hamiltonians of finite interaction range ω—has asymptotically equivalent free energy to a nearest-neighbour Ising model on the same graph. This means the capacity for self-organisation depends only on the combinatorial topology of the interaction graph, not on the number of stored patterns m, the specific energy levels E^max or E^min, or the window size ω. Three model systems are worked through: the one-dimensional windowed Potts model (Theorem 2), AR(ω) autoregressive models (Theorem 3, Corollary 2), and hierarchical clique networks (Theorem 4, Proposition 3). The load-bearing finding is that because decoder-only transformer architectures perform autoregression over a finite context window ω, they map directly onto a 1D local Hamiltonian, and therefore provably cannot sustain long-range order—a formal thermodynamic explanation for the empirically observed coherence failures in long-task performance documented in benchmarks such as Kwa et al. 2025 (arXiv:2503.14499). Conversely, biological multiscale systems organised as nested cliques admit a non-empty critical temperature range satisfying 2J/log(nmax) > T > 2Jrnmax/(r log ℓ − r log r + r), enabling local order within cliques while global disorder persists across them, which the paper argues recapitulates the tissue-level coherence and organ-level heterogeneity seen in morphogenesis. The implied prediction is that any architecture capable of long-range coherence must have an interaction topology with higher-dimensional combinatorial structure than a 1D chain—specifically, the kind of hierarchical clique organisation prevalent in biological neural and cellular systems. An alternative method the paper could have used is direct replica-method computation of quenched disorder in spin-glass formulations (as in the Edwards–Anderson model or Hopfield network saturation analysis of Amit, Gutfreund, and Sompolinsky 1987), which would access glassy phases the current Landau-theory framework explicitly sets aside. The most contestable aspect is the equilibrium assumption underlying Theorem 1: the Bogoliubov inequality bound F ≤ F0 + ⟨H − H0⟩0 is tight only near equilibrium, and real transformer inference is a non-equilibrium sampling process with finite compute budgets, anisotropic loss landscapes, and temperature scheduling—all of which could alter the scaling regime in ways the paper acknowledges but does not resolve. A critical reader would also push back on the treatment of attention as a fixed-window AR(ω) model: modern architectures use dynamic retrieval, sparse attention, and retrieval-augmented generation that effectively extend or restructure the interaction graph, potentially lifting the 1D topology constraint in ways that are not captured by the finite-ω formalism used here.

Methods (2)

  • autoregressive modeling
    Statistical technique where outputs are regressed on previous values; used in language generation
  • causally-masked attention
    Attention mechanism with causal mask limiting each token's view to previous tokens; used in decoder-only transformers

Frameworks (1)

Findings (9)

Claims (12)

Questions (1)

Original abstract (expand)

All intelligence is collective intelligence, in the sense that it is made of parts which must align with respect to system-level goals. Understanding the dynamics which facilitate or limit navigation of problem spaces by aligned parts thus impacts many fields ranging across life sciences and engineering. To that end, consider a system on the vertices of a planar graph, with pairwise interactions prescribed by the edges of the graph. Such systems can sometimes exhibit long-range order, distinguishing one phase of macroscopic behaviour from another. In networks of interacting systems we may view spontaneous ordering as a form of self-organisation, modelling neural and basal forms of cognition. Here, we discuss necessary conditions on the topology of the graph for an ordered phase to exist, with an eye towards finding constraints on the ability of a system with local interactions to maintain an ordered target state. By studying the scaling of free energy under the formation of domain walls in three model systems -- the Potts model, autoregressive models, and hierarchical networks -- we show how the combinatorics of interactions on a graph prevent or allow spontaneous ordering. As an application we are able to analyse why multiscale systems like those prevalent in biology are capable of organising into complex patterns, whereas rudimentary language models are challenged by long sequences of outputs.

Related work— refs + corpus + external arXiv

Cited / in-corpus / arXiv badges show which signals surfaced each row. Multi-source rows weighted higher.

+21 more

Similar preprints — Semantic Scholar