Janus Information Flow Transformers (Twitter thread, Sept 2025)

Original thread by janus explaining transformer information highways and introspection capabilities, posted on X.

Neighborhood — ranked by edge-count

Papers (1)

paper

Janus Information Flow Transformers 2025
cites

Thinkers (1)

thinker

janus (@repligate)
authored
Author of the thread on transformer information flow; researcher exploring AI and consciousness.

Frameworks (1)

framework

Wolfram Causal Graph
cites
A framework from Wolfram physics viewing computation as a causal graph with foliations/time-slices specifying computation order.

Methods (1)

method

KV caching
cites
Caching of key-value pairs to avoid recomputation; also provides a mechanism for introspection of earlier computations.

Concepts (10)

concept

Residual Stream
cites
Proposed pathway flowing through layers at each position; calculates K/V values that feed horizontal information flow.
Introspection
cites
The ability of a model to observe its own past internal states or computations; claimed to be architecturally permitted by transformers.
autoregressive recurrence
cites
Transformers are recurrent through autoregression because the K/V stream provides horizontal information flow across positions, even though each forward pass is feedforward.
K/V Stream
cites
Proposed pathway flowing across positions at each layer; carries key, value, and attention-weighted information horizontally.
Interferometric Cognition
introduces
Idea that redundant information paths create interference patterns, leading to memory and cognition experienced as interferometric and continuous.
exponential path combinatorics
introduces
The number of distinct paths information can travel from point A to B in a transformer is C(m+n, n), quickly exceeding the number of atoms in the universe.
attention computation
cites
Process using Q, K, V to compute a heat map over K and weighted sum of V.
K values
cites
In attention, key vectors that advertise 'where in the future should look here?'
Q values
cites
In attention, query vectors that ask 'where in the past should I look?' given the current state.
V values
cites
In attention, value vectors that carry the information future positions should receive.

Claims (9)

claim

Transformers are recurrent through autoregression because K/V stream provides horizontal information flow across positions.
cites
Claim formalizing the Anima Labs idea that transformers are effectively recurrent due to K/V stream.
Transformer architecture permits introspection; saying LLMs cannot introspect on past internal states is wrong.
cites
Janus's central claim that the architecture enables introspection, though usage in practice is a separate question.
Information from point A to B can travel through C(m+n, n) distinct paths, which quickly exceeds the number of atoms in the visible universe.
cites
Janus's mathematical claim about exponential path combinatorics in transformers.
KV caching overcomes statelessness and provides a mechanism for introspection of computations at earlier token positions.
cites
Janus's claim about KV caching as an introspection mechanism.
K values represent 'given current state, where in the future should look here?'
cites
Janus's interpretive claim about key vectors.
Q values represent 'given current state, where in the past should I look?'
cites
Janus's interpretive claim about query vectors.
Redundant information paths create interference patterns, so transformers likely experience memory and cognition as interferometric and continuous.
cites
Janus's claim linking path redundancy to interferometric phenomenology.
Transformer can be viewed as a Wolfram causal graph with foliations specifying computation order.
cites
Janus's interpretive framing of transformers as causal graphs.
V values represent 'given current state, what information should future positions that look here actually receive?'
cites
Janus's interpretive claim about value vectors.

Questions (1)

question

How are LLMs actually leveraging the architectural degrees of freedom for introspection in practice?
cites
Janus notes that while architecture permits introspection, it is a separate question how models use it.

Artifacts (1)

artifact

Anima Labs Conversation Part I
cites
Antra/Imago dialogue (cube_flipper, April 2026) arguing transformers are recurrent; cited as evidence for introspection capability.