claim
active
claim:the-direct-path-w-u-w-e-in-larger-transformers-represents-bigram-statistics-not-captured-by-more-general-grammatical-rulesThe direct path W_U W_E in larger transformers represents bigram statistics not captured by more general grammatical rules
Interpretation of the role of the direct path in multi-layer transformers; e.g. encoding that 'Barack' is often followed by 'Obama'
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Interpretive claim connecting exponential path combinatorics to Lindsey's layer-dependent findings.
- Transformer can be viewed as a Wolfram causal graph with foliations specifying computation order.claim0.770Janus's interpretive framing of transformers as causal graphs.
- TEM's path-integration representation g plays the role of position encodings in transformersclaim0.764Key structural correspondence claim linking the neuroscience model's spatial representation to ML concept of position encoding.
- Antra's foundational claim about how introspection arises computationally rather than from memorised text.
- Claim formalizing the Anima Labs idea that transformers are effectively recurrent due to K/V stream.
- Evidence that in-context learning is not mere pattern matching but genuine optimization, relevant to applying the thesis to inference
- Interpretive claim from attention head attribution analysis in appendix
- Prior finding from Grant et al. 2025 used to interpret low MAS IIA for GRU-Transformer hidden state comparisons.