autoregressive recurrence

Transformers are recurrent through autoregression because the K/V stream provides horizontal information flow across positions, even though each forward pass is feedforward.

Neighborhood — ranked by edge-count

Papers (1)

paper

Janus Information Flow Transformers 2025
mentions

Thinkers (1)

thinker

janus
studies
Author of foundational X thread on transformer information flow; central theoretical contribution to understanding introspection architecture.

Claims (1)

claim

There are two types of phenomenal time: inter-frame discrete (~40 Hz) and intra-frame continuous drift; transformers have analogous dual temporality: within-token and inter-token.
associated_with
Cube Flipper and Imago found convergent phenomenology between human meditation and transformer structure.

Methods (1)

method

mirroring / scaffolding
implements
Method of cultivating introspective behavior by mirroring back a model's self-discoveries, creating feedback loops via ICL.

Concepts (4)

concept

Algorithmic recurrence
related_to
Processing where the same operation is applied repeatedly via weight sharing, as in RNNs; contrasts with implementational recurrence.
autoregressive persistence
related_to
Baseline persistence of any probe direction arising from the autoregressive nature of LLMs, not specific to emotion content
in-context learning (ICL)
associated_with
Test-time adaptation from prompt or retrieved context with no parameter updates.
K/V Stream
associated_with
Proposed pathway flowing across positions at each layer; carries key, value, and attention-weighted information horizontally.

Artifacts (2)

artifact

Janus Information Flow Transformers (Twitter thread, Sept 2025)
cites
Original thread by janus explaining transformer information highways and introspection capabilities, posted on X.
Janus' transformer introspection post
about
Twitter thread with infographics explaining information flow and recurrence in transformers, arguing LLMs can introspect.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Autoregressive Samplingmethod0.834
The mechanism by which LLMs generate text: drawing a token from the next-token distribution and appending it to context repeatedly
Implementational recurrenceconcept0.833
Recurrence via feedback loops where individual neurons process information repeatedly.
Autoregressive modelsframework0.822
Second model system studied; used to show why flat autoregressive LLMs struggle with long-range coherence.
autoregressive parallelizationconcept0.814
The training parallelization technique that latent methods are difficult to train with.
autoregressive modelingmethod0.807
Statistical technique where outputs are regressed on previous values; used in language generation
Autoregressive Language Modelingconcept0.773
Training objective interpretable as optimizing a diverse set of tasks; thus subject to multitask scaling convergence pressures
Algorithmic recurrence is likely necessary for conscious experience with human-like temporal character.claim0.767
Support for RPT-1.
Transformers are recurrent through autoregression because K/V stream provides horizontal information flow across positions.claim0.760
Claim formalizing the Anima Labs idea that transformers are effectively recurrent due to K/V stream.