finding

active

finding:a-pair-of-query-and-key-subcomponents-distributed-across-attention-heads-performs-syntax-boundary-routing

A pair of query and key subcomponents distributed across attention heads performs syntax-boundary routing

VPD recovers an attention algorithm for routing across syntactic boundaries, distributed across heads.

Source paper

extracted_from

cimcWhitepaper

Neighborhood — ranked by edge-count

Claims (1)

claim

Attention algorithms are usually distributed across attention heads
supports
Claim supported by VPD's recovery of cross-head attention subcomponents, noted in footnote.

Communities (4)

community

Mechanistic interpretability & model evaluation
members_of
Spans attention head decomposition, benchmark awareness, and genomic pathogenicity prediction via neural models.
Mechanistic structure of transformer attention computations
members_of
Identifies distributed algorithms implemented across attention heads, with focus on causal masking limitations and emergent capabilities via activation manifold steering.
Distributed attention head decomposition
members_of
Mechanistic interpretability approach decomposing attention heads into query/key subcomponents with distinct algorithmic roles
Distributed computation across attention heads
members_of
Studies how query, key, and value components decompose into specialized subfunctions across heads, enabling routing and token prediction behaviors.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Syntax-boundary routing behaviorconcept0.846
An attention algorithm recovered by VPD that routes information across syntactic boundaries.
A pair of query and key subcomponents distributed across attention heads performs previous-token behaviorfinding0.831
VPD recovers an attention algorithm for attending to the previous token, distributed across multiple heads.
Attention computations distribute across heads via parameter subcomponents with interpretable rolesfinding0.798
Mechanistic discovery about how attention mechanisms decompose into interpretable parameter components.
Identification of algorithms implemented in attention layers, distributed across attention headsfinding0.784
VPD successfully recovered interpretable attention algorithms (previous-token behavior, syntax-boundary routing) in weight space without requiring manual decomposition across heads.
attention-based signal routingconcept0.772
Mechanism by which attention heads detect injected perturbations and route information about them to the final token position
Some attention heads partially specialize in copying for words that split into two tokens without a space prefix, attending from fragmented token to complete tokenfinding0.761
Interesting special case of copying behavior related to tokenization artifacts; primitive precursor to induction heads
Single-process, non-interruptible task switching at command boundaries is sufficient for responsive single-user systems; avoids complexity of multiprocess synchronization.hypothesis0.760
Design hypothesis that coarse-grained task switching (at commands only) eliminates need for protection mechanisms while maintaining usability.
Q/K/V values function as information routing: Q queries past, K signals future attention, V carries selectively routed information.claim0.755
Janus's interpretive model for how attention mechanisms enable deliberate information flow and selective routing.