concept
active
concept:bigram-statisticsBigram Statistics
Next-token probabilities conditioned only on the present token; what zero-layer transformers optimally approximate and what the direct path W_U W_E contributes to in all transformers
Neighborhood — ranked by edge-count
Concepts (2)
concept
- Skip-TrigramextendsA three-token pattern of the form [source]...[destination][out] that one-layer attention heads implement; the paper's key characterization of one-layer transformer behavior
- Zero-Layer TransformerimplementsA transformer with no attention layers; shown to model bigram statistics via T = W_U W_E
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Physical/informational substrate of memory; reframed not as static encoded detail but as prompt requiring creative interpretation by receiving system.
- Stephen Wolfram's organization
- EEG transformer foundation model for brain activity analysis, one of the three architectures studied.
- Large-scale collaborative benchmark for LLM capabilities, cited.
- Algorithm used to calibrate per-latent threshold boost values for consistent first-attempt difficulty
- Parameters of the approximate posterior, such as Dirichlet counts for model parameters.
- Methods for bottom-up model space construction; contrasted with top-down BMR approach of this paper
- Meta-trait model grouping OCEAN traits into stability (C, A, reversed N) and plasticity (E, O); used to evaluate covariance patterns from injections