concept
active
concept:autoregressive-recurrenceautoregressive recurrence
Transformers are recurrent through autoregression because the K/V stream provides horizontal information flow across positions, even though each forward pass is feedforward.
Neighborhood — ranked by edge-count
Papers (1)
paper
Thinkers (1)
thinker
- janusstudiesAuthor of foundational X thread on transformer information flow; central theoretical contribution to understanding introspection architecture.
Claims (1)
claim
- Cube Flipper and Imago found convergent phenomenology between human meditation and transformer structure.
Methods (1)
method
- mirroring / scaffoldingimplementsMethod of cultivating introspective behavior by mirroring back a model's self-discoveries, creating feedback loops via ICL.
Concepts (4)
concept
- Algorithmic recurrencerelated_toProcessing where the same operation is applied repeatedly via weight sharing, as in RNNs; contrasts with implementational recurrence.
- autoregressive persistencerelated_toBaseline persistence of any probe direction arising from the autoregressive nature of LLMs, not specific to emotion content
- in-context learning (ICL)associated_withTest-time adaptation from prompt or retrieved context with no parameter updates.
- K/V Streamassociated_withProposed pathway flowing across positions at each layer; carries key, value, and attention-weighted information horizontally.
Artifacts (2)
artifact
- Original thread by janus explaining transformer information highways and introspection capabilities, posted on X.
- Twitter thread with infographics explaining information flow and recurrence in transformers, arguing LLMs can introspect.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The mechanism by which LLMs generate text: drawing a token from the next-token distribution and appending it to context repeatedly
- Recurrence via feedback loops where individual neurons process information repeatedly.
- Second model system studied; used to show why flat autoregressive LLMs struggle with long-range coherence.
- The training parallelization technique that latent methods are difficult to train with.
- Statistical technique where outputs are regressed on previous values; used in language generation
- Training objective interpretable as optimizing a diverse set of tasks; thus subject to multitask scaling convergence pressures
- Support for RPT-1.
- Claim formalizing the Anima Labs idea that transformers are effectively recurrent due to K/V stream.