concept
active
concept:k-composition

K-Composition

A form of attention head composition where W_K reads from a subspace affected by a previous head; central to how induction heads are implemented

Neighborhood — ranked by edge-count

Thinkers (1)

thinker
  • Neel Nanda
    studies
    External commenter; resolved apparent counterexample to linear representation hypothesis

Methods (1)

method
  • Measuring Q-, K-, V-composition between attention heads by computing the Frobenius norm of the product of relevant matrices divided by norms of individual matrices

Concepts (3)

concept
  • Induction Heads
    implements
    Mechanistic circuits in transformers documented by Olsson et al. 2022, cited as evidence for pattern-repository assumption
  • V-Composition
    associated_with
    A form of attention head composition where W_V reads from a subspace affected by a previous head, creating virtual attention heads
  • Q-Composition
    associated_with
    A form of attention head composition where W_Q reads from a subspace affected by a previous head, allowing more complex attention patterns

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • compositionconcept0.801
    The wiring together of processes to form new processes in process theory
  • Compositionalityconcept0.753
    Central concept: how meaning of wholes depends on meanings of parts and their structural arrangement; multiple formulations explored (Frege, Schrödinger, Whitehead, LEGO).
  • Modeling function application via feedback loops between processes, ping-ponging tokens.
  • Strongest form of compositional theory requiring full decomposability and meaningful parts.
  • Composition where a whole cannot be meaningfully decomposed into its original parts, central to Schrödinger compositional theory
  • Formal linguistic principle: meaning of a whole depends only on meanings of parts and how they are fitted together (bottom-up meaning flow).
  • Primary guiding question for the paper; explores multiple formulations and uses of the term.
  • Unsupervised feature-finding method using cluster centroid difference as feature direction