Autoregressive models and context window limitations

Theoretical and empirical analysis of why AR language models cannot maintain coherence or convergence beyond their context window through local interactions alone.

14 members. Each node is clickable.

Loading graph…

Drawn from 9 sources

The papers/notes whose extracted claims & findings make up this cluster.

Topological constraints on self-organisation in locally interacting systems4 members
Paper Summary: Interpreting Language Model Parameters2 members
2026-05-14_phil-trans-A-goodfire-aboutblank-impact.md2 members
koan-battery-section.md1 member
Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations1 member
Denotational design with type class morphisms (extended version)1 member
Topological constraints on self-organization in locally interacting systems1 member
Denotational Design: from meanings to programs1 member
Genuinely Functional User Interfaces1 member

Bridges (7)

Other communities that share members with this one — cross-cutting threads or papers that sit at the seam between two themes.

Mechanistic interpretability & model evaluation14 shared
Autoregressive LLMs & formal thought disorder4 shared
Deferred approximation composition principle2 shared
Alexander's 15 Properties for AI Interface Aliveness1 shared
Sparse autoencoder interpretability limits1 shared
Denotational design for GUIs1 shared
LLM introspective awareness of injected concepts1 shared

Claims (10)

Approximations and prunings compose badly, so postpone them.
Formal denotational models of GUIs enable program verification, equivalence reasoning, and systematic extension to new paradigms.
Language models implement algorithms humans have tried and failed to write by hand for decadesOpening interpretive claim about the remarkable nature of language models.
Postpone ApproximationsApproximations and prunings compose badly; cleaner to maintain precise infinite semantics until final extraction
Practical context length limitations in language models lead to forgetting outside the window, constraining coherence over time.Claim about engineering constraint reinforcing the theoretical no-order result
Sparse autoencoders don't provide a comprehensive solution because they decode activations, not parametersCritique of activation-based interpretability methods.
The inability for autoregressive large language models to maintain states of long-range order resembles tangential speech or derailment in formal thought disorder.Analogy between LLM incoherence and schizophrenia symptoms
Autoregressive language models cannot converge to single stored patterns beyond their context window from local interactions alone.
LLM tangential speech or derailment formally resembles clinical formal thought disorder in schizophrenia.
Roughness in responses decreases with parameter count within same-alignment model families, operationalizing the cost of polishing.

Findings (4)

A unique local Hamiltonian with window length ω can be associated to any AR(ω) model (Theorem 3)Mapping autoregressive models to spin systems
Autoregressive model unable to converge to a single stored pattern for any finite β (Corollary 2)Consequence of Theorem 3 and 1D no-order result
Model precomputes answers before tool invocation and attends to cached answer over tool output when discrepancy exists, confirmed via attribution graphs.Mechanistic insight surfaced by NLA explanations and validated through independent causal attribution method.
Rudimentary language models are challenged by long sequences of outputs.Empirical observation explained by topological constraints: flat autoregressive architectures lack multiscale structure needed for long-range order.