attention computation

Process using Q, K, V to compute a heat map over K and weighted sum of V.

Neighborhood — ranked by edge-count

paper

artifact

Janus Information Flow Transformers (Twitter thread, Sept 2025)
cites
Original thread by janus explaining transformer information highways and introspection capabilities, posted on X.

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

attention mechanismconcept0.848
Core operation in transformers, computing weighted combinations of previous elements
Attention Schemaconcept0.815
A predictive model representing and controlling attention; central to attention schema theory.
Focused Attention Cycleconcept0.801
Formal model of meditation phenomenology: focus → distraction → awareness of distraction → redirection, derived from active inference.
Identification of algorithms implemented in attention layers, distributed across attention headsfinding0.797
VPD successfully recovered interpretable attention algorithms (previous-token behavior, syntax-boundary routing) in weight space without requiring manual decomposition across heads.
Attention algorithms are usually distributed across attention headsclaim0.794
Claim supported by VPD's recovery of cross-head attention subcomponents, noted in footnote.
Selective attention mechanismconcept0.791
Mechanism that selects information from modules for representation in the global workspace.
Self-attentionconcept0.789
A form of key-query attention within a single input sequence; core to Transformers.
Attention And Intention Economiesconcept0.786