concept
active
concept:bigram-statistics

Bigram Statistics

Next-token probabilities conditioned only on the present token; what zero-layer transformers optimally approximate and what the direct path W_U W_E contributes to in all transformers

Neighborhood — ranked by edge-count

Concepts (2)

concept
  • A three-token pattern of the form [source]...[destination][out] that one-layer attention heads implement; the paper's key characterization of one-layer transformer behavior
  • A transformer with no attention layers; shown to model bigram statistics via T = W_U W_E

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Engramconcept0.700
    Physical/informational substrate of memory; reframed not as static encoded detail but as prompt requiring creative interpretation by receiving system.
  • Wolfram Researchinstitute0.696
    Stephen Wolfram's organization
  • LaBraMframework0.688
    EEG transformer foundation model for brain activity analysis, one of the three architectures studied.
  • BIG-benchframework0.684
    Large-scale collaborative benchmark for LLM capabilities, cited.
  • Algorithm used to calibrate per-latent threshold boost values for consistent first-attempt difficulty
  • Parameters of the approximate posterior, such as Dirichlet counts for model parameters.
  • Methods for bottom-up model space construction; contrasted with top-down BMR approach of this paper
  • Big Two Modelframework0.660
    Meta-trait model grouping OCEAN traits into stability (C, A, reversed N) and plasticity (E, O); used to evaluate covariance patterns from injections