Skip-Trigram

A three-token pattern of the form [source]...[destination][out] that one-layer attention heads implement; the paper's key characterization of one-layer transformer behavior

Neighborhood — ranked by edge-count

Papers (1)

paper

A Mathematical Framework for Transformer Circuits
introduces

Concepts (5)

concept

Skip-Trigram Bugs
related_to
Model failures where a one-layer attention head must simultaneously increase probability of unintended token combinations because it factors the three-way interaction
in-context learning (ICL)
associated_with
Test-time adaptation from prompt or retrieved context with no parameter updates.
Induction Heads
extends
Mechanistic circuits in transformers documented by Olsson et al. 2022, cited as evidence for pattern-repository assumption
One-Layer Attention-Only Transformer
implements
The first toy model analyzed; shown to implement an ensemble of bigram and skip-trigram models readable directly from weights
Bigram Statistics
extends
Next-token probabilities conditioned only on the present token; what zero-layer transformers optimally approximate and what the direct path W_U W_E contributes to in all transformers

Related by similarity (6)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Trigram Featuresconcept0.758
Features implementing specific three-token sequence predictions (e.g., predicting '19' after 'COVID-')
causal bypassingconcept0.694
Confound where naming injected concepts reflects direct logit effects rather than metacognitive awareness, raised by Morris & Plunkett
Shadowmethod0.667
Attribute: exposing latent tendencies of a text, what isn't said but could be, a haunting presence.
Pause-Check-Correct-Proceedconcept0.666
Proposed constitutional article defining mindful reflection steps in CCAI implementation
ReActframework0.657
Prior framework for synergizing reasoning and acting in LLM agents, foundational to agent harness concept
Gradient Descentmethod0.655
Used for updating hidden state expectations; provides dynamical process theory testable against neuronal data