concept
active
concept:skip-trigram-bugsSkip-Trigram Bugs
Model failures where a one-layer attention head must simultaneously increase probability of unintended token combinations because it factors the three-way interaction
Neighborhood — ranked by edge-count
Papers (1)
paper
Claims (1)
claim
- Early example of using mechanistic interpretability to understand unintended model behavior
Concepts (1)
concept
- Skip-Trigramrelated_toA three-token pattern of the form [source]...[destination][out] that one-layer attention heads implement; the paper's key characterization of one-layer transformer behavior
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Features implementing specific three-token sequence predictions (e.g., predicting '19' after 'COVID-')
- Confound where naming injected concepts reflects direct logit effects rather than metacognitive awareness, raised by Morris & Plunkett
- Used for updating hidden state expectations; provides dynamical process theory testable against neuronal data
- Mechanism for encoding sequence order in transformers; paper argues these should reflect learned structural representations rather than fixed sines/cosines.
- Strategic filtering procedure that removes invalid trajectories and maintains optimal positive-to-negative trajectory ratio to stabilize training.
- The progressive reduction of error (stress) as cells move toward their target positions.
- Traditional parallel programming model requiring explicit point-to-point communication; Linda generalizes this via tuple spaces.
- Edge-to-edge coverings of a surface with no overlaps or gaps