Autoregressive Language Modeling

Training objective interpretable as optimizing a diverse set of tasks; thus subject to multitask scaling convergence pressures

Neighborhood — ranked by edge-count

hypothesis

Multitask Scaling Hypothesis
supports
Argues that there are fewer representations competent for N tasks than M<N tasks, so more general models have a smaller solution space

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

autoregressive modelingmethod0.900
Statistical technique where outputs are regressed on previous values; used in language generation
Autoregressive modelsframework0.869
Second model system studied; used to show why flat autoregressive LLMs struggle with long-range coherence.
Language Modelsconcept0.830
Primary substrate for manifold steering experiments; demonstrates method on reasoning and in-context tasks.
Language Modelconcept0.814
Primary test domain for manifold steering, including reasoning and ICL tasks
Autoregressive Samplingmethod0.806
The mechanism by which LLMs generate text: drawing a token from the next-token distribution and appending it to context repeatedly
autoregressive parallelizationconcept0.804
The training parallelization technique that latent methods are difficult to train with.
autoregressive persistenceconcept0.789
Baseline persistence of any probe direction arising from the autoregressive nature of LLMs, not specific to emotion content
Autoregressive language models cannot converge to single stored patterns beyond their context window from local interactions alone.claim0.788