concept
active
concept:autoregressive-language-modelingAutoregressive Language Modeling
Training objective interpretable as optimizing a diverse set of tasks; thus subject to multitask scaling convergence pressures
Neighborhood — ranked by edge-count
Hypotheses (1)
hypothesis
- Multitask Scaling HypothesissupportsArgues that there are fewer representations competent for N tasks than M<N tasks, so more general models have a smaller solution space
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Statistical technique where outputs are regressed on previous values; used in language generation
- Second model system studied; used to show why flat autoregressive LLMs struggle with long-range coherence.
- Primary substrate for manifold steering experiments; demonstrates method on reasoning and in-context tasks.
- Primary test domain for manifold steering, including reasoning and ICL tasks
- The mechanism by which LLMs generate text: drawing a token from the next-token distribution and appending it to context repeatedly
- The training parallelization technique that latent methods are difficult to train with.
- Baseline persistence of any probe direction arising from the autoregressive nature of LLMs, not specific to emotion content