Freezing Attention Patterns Trick

A conceptual technique of fixing attention patterns to make the transformer a purely linear function of tokens, enabling independent analysis of OV and QK circuits

Neighborhood — ranked by edge-count

Concepts (1)

concept

OV Circuit
about
The circuit formed by W_U W_OV^h W_E that describes how a given token affects output logits if attended to

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

attention computationconcept0.731
Process using Q, K, V to compute a heat map over K and weighted sum of V.
Value-Weighted Attention Patternconcept0.715
Attention patterns scaled by the norm of the value vector at each source position, showing how large a vector is moved from each position
Causal Attention Maskmethod0.708
Modification to transformer restricting keys and values to previous time-steps only, mimicking how an agent accumulates experiences.
Value-Weighted Attention Pattern Visualizationmethod0.704
Visualizing attention patterns weighted by the norm of value vectors to better show how much information is moved from each position
Attention algorithms are usually distributed across attention headsclaim0.701
Claim supported by VPD's recovery of cross-head attention subcomponents, noted in footnote.
Identification of algorithms implemented in attention layers, distributed across attention headsfinding0.696
VPD successfully recovered interpretable attention algorithms (previous-token behavior, syntax-boundary routing) in weight space without requiring manual decomposition across heads.
attention mechanismconcept0.696
Core operation in transformers, computing weighted combinations of previous elements
Attention probes for belief decodingconcept0.695