Next Token Prediction

The training objective of LLMs: predicting the most likely next token given context; formally P(w_{n+1}|w_1...w_n)

Neighborhood — ranked by edge-count

claim

concept

Large Language Models (LLMs)
implements
Transformer-based models like GPT-4, LaMDA, PaLM; assessed for GWT indicators.

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Next-Token Prediction (NTP)method0.882
Training objective used for all neural network models in the paper; cross-entropy loss over predicted token sequences.
Stochastic text generation (next token prediction)method0.852
The core mechanism of LLMs: predicting the next token based on previous context.
Tokenconcept0.760
Basic unit of LLM input/output: words, parts of words, punctuation marks, emojis
Previous Token Headconcept0.747
An attention head that primarily attends to the immediately preceding token; key building block for induction heads via K-composition
Previous-token attention behaviorconcept0.740
An attention algorithm recovered by VPD where the model attends to the immediately preceding token.
Token-in-Context Featureconcept0.733
Feature that fires on a specific token only within a specific surrounding context (e.g., 'the' in physics vs 'the' in mathematics)
Emoticon continuation predictionconcept0.718
The functional role of a specific VPD subcomponent in predicting emoticon/emoji continuations after punctuation.
Token embeddingsconcept0.715
Vector representations of individual tokens from genomic foundation models; the raw inputs to sequence pooling methods.