concept
active
concept:k-compositionK-Composition
A form of attention head composition where W_K reads from a subspace affected by a previous head; central to how induction heads are implemented
Neighborhood — ranked by edge-count
Thinkers (1)
thinker
- Neel NandastudiesExternal commenter; resolved apparent counterexample to linear representation hypothesis
Methods (1)
method
- Measuring Q-, K-, V-composition between attention heads by computing the Frobenius norm of the product of relevant matrices divided by norms of individual matrices
Concepts (3)
concept
- Induction HeadsimplementsMechanistic circuits in transformers documented by Olsson et al. 2022, cited as evidence for pattern-repository assumption
- V-Compositionassociated_withA form of attention head composition where W_V reads from a subspace affected by a previous head, creating virtual attention heads
- Q-Compositionassociated_withA form of attention head composition where W_Q reads from a subspace affected by a previous head, allowing more complex attention patterns
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The wiring together of processes to form new processes in process theory
- Central concept: how meaning of wholes depends on meanings of parts and their structural arrangement; multiple formulations explored (Frege, Schrödinger, Whitehead, LEGO).
- Modeling function application via feedback loops between processes, ping-ponging tokens.
- Strongest form of compositional theory requiring full decomposability and meaningful parts.
- Composition where a whole cannot be meaningfully decomposed into its original parts, central to Schrödinger compositional theory
- Formal linguistic principle: meaning of a whole depends only on meanings of parts and how they are fitted together (bottom-up meaning flow).
- Primary guiding question for the paper; explores multiple formulations and uses of the term.
- Unsupervised feature-finding method using cluster centroid difference as feature direction