method
active
method:freezing-attention-patterns-trickFreezing Attention Patterns Trick
A conceptual technique of fixing attention patterns to make the transformer a purely linear function of tokens, enabling independent analysis of OV and QK circuits
Neighborhood — ranked by edge-count
Concepts (1)
concept
- OV CircuitaboutThe circuit formed by W_U W_OV^h W_E that describes how a given token affects output logits if attended to
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Process using Q, K, V to compute a heat map over K and weighted sum of V.
- Attention patterns scaled by the norm of the value vector at each source position, showing how large a vector is moved from each position
- Modification to transformer restricting keys and values to previous time-steps only, mimicking how an agent accumulates experiences.
- Visualizing attention patterns weighted by the norm of value vectors to better show how much information is moved from each position
- Claim supported by VPD's recovery of cross-head attention subcomponents, noted in footnote.
- Identification of algorithms implemented in attention layers, distributed across attention headsfinding0.696VPD successfully recovered interpretable attention algorithms (previous-token behavior, syntax-boundary routing) in weight space without requiring manual decomposition across heads.
- Core operation in transformers, computing weighted combinations of previous elements