concept
active
concept:value-weighted-attention-patternValue-Weighted Attention Pattern
Attention patterns scaled by the norm of the value vector at each source position, showing how large a vector is moved from each position
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Visualizing attention patterns weighted by the norm of value vectors to better show how much information is moved from each position
- Process using Q, K, V to compute a heat map over K and weighted sum of V.
- Formal model of meditation phenomenology: focus → distraction → awareness of distraction → redirection, derived from active inference.
- Core operation in transformers, computing weighted combinations of previous elements
- Mathematical equivalence showing the relationship between attention mechanisms and convolutional operations
- Identification of algorithms implemented in attention layers, distributed across attention headsfinding0.737VPD successfully recovered interpretable attention algorithms (previous-token behavior, syntax-boundary routing) in weight space without requiring manual decomposition across heads.
- Variable aperture of attention, ranging from collapsed to expanded; human and model analogs.
- Decrease in attention paid to system prompt over conversational turns, leading to persona fidelity degradation (cited from Li et al.)