hypothesis
active
hypothesis:the-primer-architecture-s-depthwise-convolution-change-would-allow-induction-heads-to-form-without-requiring-k-composition

The Primer architecture's depthwise convolution change would allow induction heads to form without requiring K-composition

Architectural interpretation of how Primer's design change relates to the paper's mechanistic theory of induction heads

Source paper

extracted_from
A Mathematical Framework for Transformer Circuits
(2021) ·

Neighborhood — ranked by edge-count

Concepts (1)

concept
  • A transformer variant discovered via automated architecture search that includes depthwise convolution over last three positions in key/query computation, making induction heads expressible without K-composition

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.