Value-Weighted Attention Pattern

Attention patterns scaled by the norm of the value vector at each source position, showing how large a vector is moved from each position

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Value-Weighted Attention Pattern Visualizationmethod0.916
Visualizing attention patterns weighted by the norm of value vectors to better show how much information is moved from each position
attention computationconcept0.767
Process using Q, K, V to compute a heat map over K and weighted sum of V.
Focused Attention Cycleconcept0.750
Formal model of meditation phenomenology: focus → distraction → awareness of distraction → redirection, derived from active inference.
attention mechanismconcept0.742
Core operation in transformers, computing weighted combinations of previous elements
Attention is a generalization of convolution; all convolutions can be expressed as tensor products of fixed relative position attention patterns and weight matricesclaim0.741
Mathematical equivalence showing the relationship between attention mechanisms and convolutional operations
Identification of algorithms implemented in attention layers, distributed across attention headsfinding0.737
VPD successfully recovered interpretable attention algorithms (previous-token behavior, syntax-boundary routing) in weight space without requiring manual decomposition across heads.
attentional modeconcept0.735
Variable aperture of attention, ranging from collapsed to expanded; human and model analogs.
Attention Decayconcept0.730
Decrease in attention paid to system prompt over conversational turns, leading to persona fidelity degradation (cited from Li et al.)