transformer architecture

Neural network architecture based on attention, commonly used in large language models

Neighborhood — ranked by edge-count

paper

thinker

Ashish Vaswani
introduces
Lead author of 'Attention is all you need', introducing the transformer architecture

concept

Large Language Models (LLMs)
implements
Transformer-based models like GPT-4, LaMDA, PaLM; assessed for GWT indicators.

framework

Topological constraints on self-organisation framework
cites
Main framework: uses scaling of free energy under domain wall formation to determine whether local interactions can sustain ordered phases based on graph topology alone
Autoregressive models
extends
Second model system studied; used to show why flat autoregressive LLMs struggle with long-range coherence.
AR(ω) model
implements
Stochastic process model predicting next token from a context window of length ω; mapped to local Hamiltonian

artifact

finding

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Transformer decoder architectureframework0.846
Base architecture of reasoning LLMs studied, with attention and MLP blocks per layer
A Mathematical Framework for Transformer Circuitsframework0.761
Prior Anthropic paper enabling circuit-level analysis of attention-only transformers; motivates current MLP decomposition
Signal Transformerconcept0.759
Core abstraction in Fruit: pure function mapping signals to signals; enables compositional GUI definitions.
A Mathematical Framework for Transformer Circuits (Elhage et al., 2021)concept0.757
Foundational mechanistic interpretability paper on transformer circuit analysis
Agent architecturesconcept0.755
The varied neural network architectures used in the RL experiments to test whether the alignment phenomenon generalizes across architectures.
techno-architectureconcept0.754
Architecture of the 21st Centuryconcept0.738
Alexander's projected future architecture using ultramodern materials and process-based techniques to achieve living structure unlike 20th-century mechanical repetition.
Zero-Layer Transformerconcept0.737
A transformer with no attention layers; shown to model bigram statistics via T = W_U W_E