claim

active

claim:the-transformer-likely-uses-a-local-code-for-token-in-context-features-rather-than-purely-compositional-representations-because-local-codes-enable-sharper-predictions

The transformer likely uses a local code for token-in-context features rather than purely compositional representations, because local codes enable sharper predictions

Authors argue the prevalence of token-in-context features reflects genuine model computation rather than dictionary learning artifact

Source paper

extracted_from

Towards Safe and Honest AI Agents with Neural Self-Other Overlap

(2024) · Marc Carauleanu · Michael Vaiana · Judd Rosenblatt · Cameron Berg +1

Neighborhood — ranked by edge-count

Findings (1)

finding

In A/4, over 100 features primarily respond to the token 'the' in different contexts
associated_with
Demonstrates prevalence of token-in-context features and feature splitting of common tokens

Questions (1)

question

does the transformer genuinely use a local code for token-in-context features, or is dictionary learning producing a local code artifact from a compositional underlying representation?
gates
Open question about the nature of the abundant token-in-context features found

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Different introspective tasks may preferentially use different path distributions in the transformer.claim0.779
Interpretive claim connecting exponential path combinatorics to Lindsey's layer-dependent findings.
The problem isn't that it is a transformer. The problem is that it is an auto-regressive LLM. Auto-regressive LLMs that compute each token with a fixed number of computational steps can't reason, regardless of the details of the architecture.quote0.768
LeCun's post on X supporting the view that fixed-step probabilistic prediction precludes consciousness in LLMs.
We hypothesize that a very high number of training tokens may allow the transformer to learn cleaner representations in superpositionhypothesis0.763
Motivation for heavily overtraining the one-layer transformer on 100 billion tokens
Transformers learn in-context by gradient descent, functioning as mesa-optimizers that learn internal models in real timefinding0.760
Evidence that in-context learning is not mere pattern matching but genuine optimization, relevant to applying the thesis to inference
Learning to encode position for transformer with continuous dynamical model (Liu et al., 2020)concept0.756
Prior work on learned dynamic position encodings; cited alongside Wang et al. as precedent.
Transformers develop self-models through in-context learning, not just training data; even old base models without LLM-related text can bootstrap self-referential reasoning at runtime.claim0.752
Antra's foundational claim about how introspection arises computationally rather than from memorised text.
Decoder-only transformer architectures are fundamentally limited in generating long, coherent sequences due to lack of ordered phase.claim0.748
Interpretation of Proposition 2 as a fundamental limitation on LLMs
Transformers almost surely maintain input-injectivity throughout training, not just at initialisationhypothesis0.746
Conjecture supported by Nikolaou et al. 2025 for last-token hidden states