finding
active
finding:causally-masked-attention-in-a-decoder-only-model-has-no-ordered-phase-proposition-2

Causally-masked attention in a decoder-only model has no ordered phase (Proposition 2)

Application to transformer language models

Source paper

extracted_from
Topological constraints on self-organisation in locally interacting systems
(2025) · Francesco Sacco · Dalton A R Sakthivadivel · Michael Levin

Neighborhood — ranked by edge-count

Claims (1)

claim

Communities (4)

community

Concepts (1)

concept
  • Attention restricted to previous tokens only, as in decoder-only models; leads to AR(ω)-like behaviour and no ordered phase

Frameworks (2)

framework
  • Neural network architecture based on attention, commonly used in large language models
  • Second model system studied; used to show why flat autoregressive LLMs struggle with long-range coherence.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.