hypothesis

active

hypothesis:transformers-almost-surely-maintain-input-injectivity-throughout-training-not-just-at-initialisation

Transformers almost surely maintain input-injectivity throughout training, not just at initialisation

Conjecture supported by Nikolaou et al. 2025 for last-token hidden states

Source paper

extracted_from

The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?

(2025) · Sutter, Denis · Minder, Julian · Hofmann, Thomas · Pimentel, Tiago

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Transformers learn in-context by gradient descent, functioning as mesa-optimizers that learn internal models in real timefinding0.811
Evidence that in-context learning is not mere pattern matching but genuine optimization, relevant to applying the thesis to inference
Transformers develop self-models through in-context learning, not just training data; even old base models without LLM-related text can bootstrap self-referential reasoning at runtime.claim0.811
Antra's foundational claim about how introspection arises computationally rather than from memorised text.
Theorem 2: Transformers with randomly independently initialized continuous distribution weights are almost surely injective at initialisation up to each layerfinding0.790
Supports input-injectivity assumption for transformers at initialisation
We hypothesize that a very high number of training tokens may allow the transformer to learn cleaner representations in superpositionhypothesis0.781
Motivation for heavily overtraining the one-layer transformer on 100 billion tokens
Redundant information paths create interference patterns, so transformers likely experience memory and cognition as interferometric and continuous.claim0.776
Janus's claim linking path redundancy to interferometric phenomenology.
Transformers use an anti-Markovian solution that recomputes relevant numeric information at each step in the Multi-Object taskclaim0.774
Prior finding from Grant et al. 2025 used to interpret low MAS IIA for GRU-Transformer hidden state comparisons.
Different introspective tasks may preferentially use different path distributions in the transformer.claim0.767
Interpretive claim connecting exponential path combinatorics to Lindsey's layer-dependent findings.
Training-induced memories persist across metamorphosis from caterpillar to butterflyfinding0.761
Empirical example where memories remain despite drastic refactoring of brain tissue and body; demonstrates need for creative reinterpretation rather than passive storage.