method
active
method:frobenius-norm-composition-measurementFrobenius Norm Composition Measurement
Measuring Q-, K-, V-composition between attention heads by computing the Frobenius norm of the product of relevant matrices divided by norms of individual matrices
Neighborhood — ranked by edge-count
Papers (1)
paper
Concepts (1)
concept
- K-CompositionaboutA form of attention head composition where W_K reads from a subspace affected by a previous head; central to how induction heads are implemented
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Formal linguistic principle: meaning of a whole depends only on meanings of parts and how they are fitted together (bottom-up meaning flow).
- Result from applying the Frobenius norm composition measurement to all attention head pairs in the two-layer model
- Scaling aggregated gradient by the maximum gradient norm among tasks.
- Training-free technique normalizing all task gradients to the maximum gradient norm magnitude
- Process theory variant where composition is non-trivial (whole cannot be decomposed meaningfully) and all ingredients have clear ontological counterparts in reality.
- Central concept: how meaning of wholes depends on meanings of parts and their structural arrangement; multiple formulations explored (Frege, Schrödinger, Whitehead, LEGO).
- Re-running probabilistic bisection on each fine-tuned checkpoint to normalize first-attempt difficulty
- Experimental variable sweeping proportion of self-correction to normal response training examples from 10% to 90%