Token

Basic unit of LLM input/output: words, parts of words, punctuation marks, emojis

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Functional Tokenconcept0.823
A discrete token in the vocabulary that represents a visual operation (e.g., <|Line|>, <|Shape|>, <|Text|>), generated via next-token prediction within autoregressive sequences.
Token embeddingsconcept0.808
Vector representations of individual tokens from genomic foundation models; the raw inputs to sequence pooling methods.
Token-in-Context Featureconcept0.788
Feature that fires on a specific token only within a specific surrounding context (e.g., 'the' in physics vs 'the' in mathematics)
tokenizer vocabularyconcept0.779
The standard set of tokens that the functional token remains a part of.
Digit-token logit distributionconcept0.760
Full distribution over tokens 0-9 at first generation step; contains more information than any single sampled token
Next Token Predictionconcept0.760
The training objective of LLMs: predicting the most likely next token given context; formally P(w_{n+1}|w_1...w_n)
Anomalous Tokensconcept0.751
Extremely rare or never-used vocabulary elements that may distort logit weight analysis; excluded from feature analysis
Previous Token Headconcept0.748
An attention head that primarily attends to the immediately preceding token; key building block for induction heads via K-composition