Stochastic text generation (next token prediction)

The core mechanism of LLMs: predicting the next token based on previous context.

Neighborhood — ranked by edge-count

concept

Inner monologue / chain-of-thought in LLMs
associated_with
The hidden reasoning steps generated by recent LLMs before visible output; mentioned in the technology section.

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Next Token Predictionconcept0.852
The training objective of LLMs: predicting the most likely next token given context; formally P(w_{n+1}|w_1...w_n)
Next-Token Prediction (NTP)method0.808
Training objective used for all neural network models in the paper; cross-entropy loss over predicted token sequences.
stochastic parrotconcept0.733
Derogatory term for LLMs; Nix's commentary frames it as camp opposite to nascent consciousness.
A pair of query and key subcomponents distributed across attention heads performs previous-token behaviorfinding0.720
VPD recovers an attention algorithm for attending to the previous token, distributed across multiple heads.
Previous-token attention behaviorconcept0.718
An attention algorithm recovered by VPD where the model attends to the immediately preceding token.
Zhang et al. 2023: STORM: Efficient Stochastic Transformer based World Models (NeurIPS 2023)concept0.716
World model framework extended and used in the present implementation
Reasoning models generate performative CoT tokens after achieving strong confidence in their final answer without revealing this belief in textclaim0.710
The central empirical claim of the paper, supported by activation probing evidence
GPT does not generate rollouts during training, so there is no reason to expect that GPT will form preferences over the consequences of its output related to the text prediction objective.claim0.710
Argues against instrumental convergence in GPT.