Trigram Features

Features implementing specific three-token sequence predictions (e.g., predicting '19' after 'COVID-')

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Skip-Trigramconcept0.758
A three-token pattern of the form [source]...[destination][out] that one-layer attention heads implement; the paper's key characterization of one-layer transformer behavior
Skip-Trigram Bugsconcept0.740
Model failures where a one-layer attention head must simultaneously increase probability of unintended token combinations because it factors the three-way interaction
Geometry of featuresconcept0.735
Research thread within About Blank concerning the structure and relational properties of neural network feature representations; covariance pooling tangentially supports this thread.
Feature Visualizationmethod0.727
Method of optimizing input to cause a neuron to fire maximally, used to characterize what a neuron detects; establishes causal link
Fourier featuresconcept0.719
Features identified in Llama-3.1-8B that compute sums using periods respecting base-10 addition (2, 5, 10) rather than concept-specific periods
Linear Representation of Featuresconcept0.699
The central object of study — the idea that a concept like truth is encoded as a direction in the LLM's latent space
Feature Manifoldsframework0.691
Hypothesized extension of superposition where features may be higher-dimensional manifolds rather than 1D directions
atomic featuresconcept0.689
The idea that interpretability should decompose representations into minimal, indivisible feature units; contrasted with manifold-level descriptions.