concept
active
concept:attention-computationattention computation
Process using Q, K, V to compute a heat map over K and weighted sum of V.
Neighborhood — ranked by edge-count
Papers (1)
paper
Artifacts (1)
artifact
- Original thread by janus explaining transformer information highways and introspection capabilities, posted on X.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Core operation in transformers, computing weighted combinations of previous elements
- A predictive model representing and controlling attention; central to attention schema theory.
- Formal model of meditation phenomenology: focus → distraction → awareness of distraction → redirection, derived from active inference.
- Identification of algorithms implemented in attention layers, distributed across attention headsfinding0.797VPD successfully recovered interpretable attention algorithms (previous-token behavior, syntax-boundary routing) in weight space without requiring manual decomposition across heads.
- Claim supported by VPD's recovery of cross-head attention subcomponents, noted in footnote.
- Mechanism that selects information from modules for representation in the global workspace.
- A form of key-query attention within a single input sequence; core to Transformers.