concept
active
concept:zero-layer-transformer

Zero-Layer Transformer

A transformer with no attention layers; shown to model bigram statistics via T = W_U W_E

Neighborhood — ranked by edge-count

Concepts (1)

concept
  • Next-token probabilities conditioned only on the present token; what zero-layer transformers optimally approximate and what the direct path W_U W_E contributes to in all transformers

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • The first toy model analyzed; shown to implement an ensemble of bigram and skip-trigram models readable directly from weights
  • Signal Transformerconcept0.752
    Core abstraction in Fruit: pure function mapping signals to signals; enables compositional GUI definitions.
  • The primary model analyzed; uses attention head composition, especially K-composition, to create induction heads for powerful in-context learning
  • Two-layer transformer with rotary positional encodings used in numeric task experiments.
  • Neural network architecture based on attention, commonly used in large language models
  • The transformer version directly analogous to TEM, introduced in this paper, offering dramatic performance improvements.
  • Hypothesis that neocortical circuits beyond hippocampus may implement transformer-like computations for language and other domains.
  • Metric measuring fraction of MLP loss contribution explained by the autoencoder by replacing MLP activations with autoencoder outputs