Summarization Token Behavior

Behavior where information about full clauses is encoded over clause-ending punctuation tokens in LLMs

Neighborhood — ranked by edge-count

paper

concept

Summarization Behavior
related_to
The phenomenon where LLMs encode clause-level information over clause-ending punctuation tokens rather than the final content token

finding

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Previous-token attention behaviorconcept0.748
An attention algorithm recovered by VPD where the model attends to the immediately preceding token.
LLaMA-2-70B displays summarization behavior over punctuation tokens in a context-dependent way: present for cities but not for sp_en_transfinding0.722
Contrasts with 7B and 13B which show consistent summarization behavior; may complicate localization at 70B scale
Token-in-Context Featureconcept0.714
Feature that fires on a specific token only within a specific surrounding context (e.g., 'the' in physics vs 'the' in mathematics)
A pair of query and key subcomponents distributed across attention heads performs previous-token behaviorfinding0.713
VPD recovers an attention algorithm for attending to the previous token, distributed across multiple heads.
Tokenconcept0.709
Basic unit of LLM input/output: words, parts of words, punctuation marks, emojis
tokenizer vocabularyconcept0.705
The standard set of tokens that the functional token remains a part of.
Single-Token Featuresconcept0.700
Features that fire on every instance of a single token; appear in small dictionaries as collapsed versions of many token-in-context features
Token-level supervision enables models to learn functional-token invocation from reasoning contextclaim0.699
ATLAS author's assertion that functional tokens optimized via standard cross-entropy loss learn when and how to invoke operations from surrounding text.