claim
active
claim:transformers-are-recurrent-through-autoregression-because-k-v-stream-provides-horizontal-information-flow-across-positionsTransformers are recurrent through autoregression because K/V stream provides horizontal information flow across positions.
Claim formalizing the Anima Labs idea that transformers are effectively recurrent due to K/V stream.
Source paper
extracted_fromNeighborhood — ranked by edge-count
Papers (1)
paper
Communities (1)
community
- Spans attention head decomposition, benchmark awareness, and genomic pathogenicity prediction via neural models.
Concepts (1)
concept
- K/V StreamsupportsProposed pathway flowing across positions at each layer; carries key, value, and attention-weighted information horizontally.
Artifacts (2)
artifact
- Original thread by janus explaining transformer information highways and introspection capabilities, posted on X.
- Antra/Imago dialogue (cube_flipper, April 2026) arguing transformers are recurrent; cited as evidence for introspection capability.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Prior finding from Grant et al. 2025 used to interpret low MAS IIA for GRU-Transformer hidden state comparisons.
- Interpretive claim connecting exponential path combinatorics to Lindsey's layer-dependent findings.
- Janus's claim linking path redundancy to interferometric phenomenology.
- LeCun's post on X supporting the view that fixed-step probabilistic prediction precludes consciousness in LLMs.
- Prior work on recurrently generated position encodings; cited as precedent for TEM-t's recurrent position encoding method.
- Cube Flipper and Imago found convergent phenomenology between human meditation and transformer structure.
- Transformers are recurrent through autoregression because the K/V stream provides horizontal information flow across positions, even though each forward pass is feedforward.
- Interpretation of the role of the direct path in multi-layer transformers; e.g. encoding that 'Barack' is often followed by 'Obama'