autoregressive persistence

Baseline persistence of any probe direction arising from the autoregressive nature of LLMs, not specific to emotion content

Neighborhood — ranked by edge-count

method

Variance-Matched Random Probe Comparison
about
Controls for variance by sampling random directions from top-k PC spaces matching each emotion probe's explained variance, and subtracting median persistence of 20 matched directions

concept

autoregressive recurrence
related_to
Transformers are recurrent through autoregression because the K/V stream provides horizontal information flow across positions, even though each forward pass is feedforward.
residual persistence
extends
Emotion feature persistence above and beyond the persistence expected from high variance explained alone, computed by subtracting median variance-matched probe persistence

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Autoregressive modelsframework0.828
Second model system studied; used to show why flat autoregressive LLMs struggle with long-range coherence.
Autoregressive Samplingmethod0.819
The mechanism by which LLMs generate text: drawing a token from the next-token distribution and appending it to context repeatedly
autoregressive parallelizationconcept0.817
The training parallelization technique that latent methods are difficult to train with.
autoregressive modelingmethod0.809
Statistical technique where outputs are regressed on previous values; used in language generation
Autoregressive Language Modelingconcept0.789
Training objective interpretable as optimizing a diverse set of tasks; thus subject to multitask scaling convergence pressures
Autoregressive model unable to converge to a single stored pattern for any finite β (Corollary 2)finding0.787
Consequence of Theorem 3 and 1D no-order result
emotion feature persistenceconcept0.779
The phenomenon that emotion feature activations remain elevated above baseline beyond local token bursts, measurable as long-range correlation
Autoregressive language models cannot converge to single stored patterns beyond their context window from local interactions alone.claim0.779