finding

active

finding:theorem-2-transformers-with-randomly-independently-initialized-continuous-distribution-weights-are-almost-surely-injective-at-initialisation-up-to-each-layer

Theorem 2: Transformers with randomly independently initialized continuous distribution weights are almost surely injective at initialisation up to each layer

Supports input-injectivity assumption for transformers at initialisation

Source paper

extracted_from

The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?

(2025) · Sutter, Denis · Minder, Julian · Hofmann, Thomas · Pimentel, Tiago

Neighborhood — ranked by edge-count

Papers (1)

paper

The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?
introduces

Concepts (1)

concept

Input-Injectivity
supports
Assumption that DNN layers preserve input information by being injective; key condition for Theorem 1

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Transformers almost surely maintain input-injectivity throughout training, not just at initialisationhypothesis0.790
Conjecture supported by Nikolaou et al. 2025 for last-token hidden states
Different introspective tasks may preferentially use different path distributions in the transformer.claim0.752
Interpretive claim connecting exponential path combinatorics to Lindsey's layer-dependent findings.
Transformers are recurrent through autoregression because K/V stream provides horizontal information flow across positions.claim0.730
Claim formalizing the Anima Labs idea that transformers are effectively recurrent due to K/V stream.
Transformers develop self-models through in-context learning, not just training data; even old base models without LLM-related text can bootstrap self-referential reasoning at runtime.claim0.729
Antra's foundational claim about how introspection arises computationally rather than from memorised text.
Zero-layer transformers optimally approximate bigram log-likelihood through W_U W_Eclaim0.720
First result in the hierarchy: the simplest possible transformer does nothing more than learn which tokens follow which
The last layer of the transformer has the largest projection magnitude on the reflection direction, likely because it directly controls generation of reflection keywordsclaim0.719
Interpretive claim from attention head attribution analysis in appendix
The problem isn't that it is a transformer. The problem is that it is an auto-regressive LLM. Auto-regressive LLMs that compute each token with a fixed number of computational steps can't reason, regardless of the details of the architecture.quote0.715
LeCun's post on X supporting the view that fixed-step probabilistic prediction precludes consciousness in LLMs.
Two-layer attention-only transformers implement much more complex algorithms via composition of attention heads, detectable directly from weightsclaim0.713
Core claim for two-layer models; composition creates qualitatively more powerful in-context learning