method
active
method:stochastic-text-generation-next-token-predictionStochastic text generation (next token prediction)
The core mechanism of LLMs: predicting the next token based on previous context.
Neighborhood — ranked by edge-count
Concepts (1)
concept
- Inner monologue / chain-of-thought in LLMsassociated_withThe hidden reasoning steps generated by recent LLMs before visible output; mentioned in the technology section.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The training objective of LLMs: predicting the most likely next token given context; formally P(w_{n+1}|w_1...w_n)
- Training objective used for all neural network models in the paper; cross-entropy loss over predicted token sequences.
- Derogatory term for LLMs; Nix's commentary frames it as camp opposite to nascent consciousness.
- A pair of query and key subcomponents distributed across attention heads performs previous-token behaviorfinding0.720VPD recovers an attention algorithm for attending to the previous token, distributed across multiple heads.
- An attention algorithm recovered by VPD where the model attends to the immediately preceding token.
- Zhang et al. 2023: STORM: Efficient Stochastic Transformer based World Models (NeurIPS 2023)concept0.716World model framework extended and used in the present implementation
- The central empirical claim of the paper, supported by activation probing evidence
- Argues against instrumental convergence in GPT.