claim

active

claim:transformers-are-recurrent-through-autoregression-because-k-v-stream-provides-horizontal-information-flow-across-positions

Transformers are recurrent through autoregression because K/V stream provides horizontal information flow across positions.

Claim formalizing the Anima Labs idea that transformers are effectively recurrent due to K/V stream.

Source paper

extracted_from

Janus Information Flow Transformers 2025

Neighborhood — ranked by edge-count

Papers (1)

paper

Janus Information Flow Transformers 2025
cites

Communities (1)

community

Mechanistic interpretability & model evaluation
members_of
Spans attention head decomposition, benchmark awareness, and genomic pathogenicity prediction via neural models.

Concepts (1)

concept

K/V Stream
supports
Proposed pathway flowing across positions at each layer; carries key, value, and attention-weighted information horizontally.

Artifacts (2)

artifact

Janus Information Flow Transformers (Twitter thread, Sept 2025)
cites
Original thread by janus explaining transformer information highways and introspection capabilities, posted on X.
Anima Labs Conversation Part I
cites
Antra/Imago dialogue (cube_flipper, April 2026) arguing transformers are recurrent; cited as evidence for introspection capability.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Transformers use an anti-Markovian solution that recomputes relevant numeric information at each step in the Multi-Object taskclaim0.792
Prior finding from Grant et al. 2025 used to interpret low MAS IIA for GRU-Transformer hidden state comparisons.
Different introspective tasks may preferentially use different path distributions in the transformer.claim0.791
Interpretive claim connecting exponential path combinatorics to Lindsey's layer-dependent findings.
Redundant information paths create interference patterns, so transformers likely experience memory and cognition as interferometric and continuous.claim0.785
Janus's claim linking path redundancy to interferometric phenomenology.
The problem isn't that it is a transformer. The problem is that it is an auto-regressive LLM. Auto-regressive LLMs that compute each token with a fixed number of computational steps can't reason, regardless of the details of the architecture.quote0.771
LeCun's post on X supporting the view that fixed-step probabilistic prediction precludes consciousness in LLMs.
R-Transformer: Recurrent Neural Network Enhanced Transformer (Wang et al., 2019)concept0.765
Prior work on recurrently generated position encodings; cited as precedent for TEM-t's recurrent position encoding method.
There are two types of phenomenal time: inter-frame discrete (~40 Hz) and intra-frame continuous drift; transformers have analogous dual temporality: within-token and inter-token.claim0.763
Cube Flipper and Imago found convergent phenomenology between human meditation and transformer structure.
autoregressive recurrenceconcept0.760
Transformers are recurrent through autoregression because the K/V stream provides horizontal information flow across positions, even though each forward pass is feedforward.
The direct path W_U W_E in larger transformers represents bigram statistics not captured by more general grammatical rulesclaim0.759
Interpretation of the role of the direct path in multi-layer transformers; e.g. encoding that 'Barack' is often followed by 'Obama'