concept
active
concept:autoregressive-parallelizationautoregressive parallelization
The training parallelization technique that latent methods are difficult to train with.
Neighborhood — ranked by edge-count
Papers (1)
paper
Claims (1)
claim
- Identifies key limitations of latent methods.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The mechanism by which LLMs generate text: drawing a token from the next-token distribution and appending it to context repeatedly
- Baseline persistence of any probe direction arising from the autoregressive nature of LLMs, not specific to emotion content
- Transformers are recurrent through autoregression because the K/V stream provides horizontal information flow across positions, even though each forward pass is feedforward.
- Second model system studied; used to show why flat autoregressive LLMs struggle with long-range coherence.
- Statistical technique where outputs are regressed on previous values; used in language generation
- Training objective interpretable as optimizing a diverse set of tasks; thus subject to multitask scaling convergence pressures
- Attribute: an attempt at dualism and dialogue, running texts alongside each other, but inherently unstable.