claim
active
claim:the-direct-path-w-u-w-e-in-larger-transformers-represents-bigram-statistics-not-captured-by-more-general-grammatical-rules

The direct path W_U W_E in larger transformers represents bigram statistics not captured by more general grammatical rules

Interpretation of the role of the direct path in multi-layer transformers; e.g. encoding that 'Barack' is often followed by 'Obama'

Source paper

extracted_from
A Mathematical Framework for Transformer Circuits
(2021) ·

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.