finding
active
finding:theorem-2-transformers-with-randomly-independently-initialized-continuous-distribution-weights-are-almost-surely-injective-at-initialisation-up-to-each-layerTheorem 2: Transformers with randomly independently initialized continuous distribution weights are almost surely injective at initialisation up to each layer
Supports input-injectivity assumption for transformers at initialisation
Source paper
extracted_from(2025) · Sutter, Denis · Minder, Julian · Hofmann, Thomas · Pimentel, Tiago
Neighborhood — ranked by edge-count
Papers (1)
paper
Concepts (1)
concept
- Input-InjectivitysupportsAssumption that DNN layers preserve input information by being injective; key condition for Theorem 1
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Transformers almost surely maintain input-injectivity throughout training, not just at initialisationhypothesis0.790Conjecture supported by Nikolaou et al. 2025 for last-token hidden states
- Interpretive claim connecting exponential path combinatorics to Lindsey's layer-dependent findings.
- Claim formalizing the Anima Labs idea that transformers are effectively recurrent due to K/V stream.
- Antra's foundational claim about how introspection arises computationally rather than from memorised text.
- First result in the hierarchy: the simplest possible transformer does nothing more than learn which tokens follow which
- Interpretive claim from attention head attribution analysis in appendix
- LeCun's post on X supporting the view that fixed-step probabilistic prediction precludes consciousness in LLMs.
- Core claim for two-layer models; composition creates qualitatively more powerful in-context learning