hypothesis
active
hypothesis:transformers-almost-surely-maintain-input-injectivity-throughout-training-not-just-at-initialisationTransformers almost surely maintain input-injectivity throughout training, not just at initialisation
Conjecture supported by Nikolaou et al. 2025 for last-token hidden states
Source paper
extracted_from(2025) · Sutter, Denis · Minder, Julian · Hofmann, Thomas · Pimentel, Tiago
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Evidence that in-context learning is not mere pattern matching but genuine optimization, relevant to applying the thesis to inference
- Antra's foundational claim about how introspection arises computationally rather than from memorised text.
- Supports input-injectivity assumption for transformers at initialisation
- We hypothesize that a very high number of training tokens may allow the transformer to learn cleaner representations in superpositionhypothesis0.781Motivation for heavily overtraining the one-layer transformer on 100 billion tokens
- Janus's claim linking path redundancy to interferometric phenomenology.
- Prior finding from Grant et al. 2025 used to interpret low MAS IIA for GRU-Transformer hidden state comparisons.
- Interpretive claim connecting exponential path combinatorics to Lindsey's layer-dependent findings.
- Empirical example where memories remain despite drastic refactoring of brain tissue and body; demonstrates need for creative reinterpretation rather than passive storage.