finding
active
finding:a-pair-of-query-and-key-subcomponents-distributed-across-attention-heads-performs-previous-token-behavior

A pair of query and key subcomponents distributed across attention heads performs previous-token behavior

VPD recovers an attention algorithm for attending to the previous token, distributed across multiple heads.

Source paper

extracted_from
cimcWhitepaper

Neighborhood — ranked by edge-count

Claims (1)

claim

Communities (4)

community

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.