concept
active
concept:previous-token-attention-behaviorPrevious-token attention behavior
An attention algorithm recovered by VPD where the model attends to the immediately preceding token.
Neighborhood — ranked by edge-count
Papers (1)
paper
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- A pair of query and key subcomponents distributed across attention heads performs previous-token behaviorfinding0.842VPD recovers an attention algorithm for attending to the previous token, distributed across multiple heads.
- An attention head that primarily attends to the immediately preceding token; key building block for induction heads via K-composition
- Behavior where information about full clauses is encoded over clause-ending punctuation tokens in LLMs
- The training objective of LLMs: predicting the most likely next token given context; formally P(w_{n+1}|w_1...w_n)
- Token-level supervision enables models to learn functional-token invocation from reasoning contextclaim0.730ATLAS author's assertion that functional tokens optimized via standard cross-entropy loss learn when and how to invoke operations from surrounding text.
- Describes the properties of the functional token.
- Feature that fires on a specific token only within a specific surrounding context (e.g., 'the' in physics vs 'the' in mathematics)
- Interesting special case of copying behavior related to tokenization artifacts; primitive precursor to induction heads