Next-Token Prediction (NTP)

Training objective used for all neural network models in the paper; cross-entropy loss over predicted token sequences.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Next Token Predictionconcept0.882
The training objective of LLMs: predicting the most likely next token given context; formally P(w_{n+1}|w_1...w_n)
Stochastic text generation (next token prediction)method0.808
The core mechanism of LLMs: predicting the next token based on previous context.
Beyond NLPconcept0.712
Extension of compositional methods beyond natural language processing to general AI and hardware
A pair of query and key subcomponents distributed across attention heads performs previous-token behaviorfinding0.701
VPD recovers an attention algorithm for attending to the previous token, distributed across multiple heads.
zero-shot predictionconcept0.700
Prediction without task-specific training; Evee achieves 0.991 AUROC on indels in zero-shot mode.
Normative predictionsconcept0.694
In active inference, predictions about what should be, held to motivate action; overflow leads to tanha.
Prediction Errorconcept0.693
Role in optimizing sensory states; unified treatment shows value-learning and perception share error-minimization principle.
Previous-token attention behaviorconcept0.692
An attention algorithm recovered by VPD where the model attends to the immediately preceding token.