question

active

question:does-the-transformer-genuinely-use-a-local-code-for-token-in-context-features-or-is-dictionary-learning-producing-a-local-code-artifact-from-a-compositional-underlying-representation

does the transformer genuinely use a local code for token-in-context features, or is dictionary learning producing a local code artifact from a compositional underlying representation?

Open question about the nature of the abundant token-in-context features found

Source paper

extracted_from

Towards Safe and Honest AI Agents with Neural Self-Other Overlap

(2024) · Marc Carauleanu · Michael Vaiana · Judd Rosenblatt · Cameron Berg +1

Neighborhood — ranked by edge-count

Claims (1)

claim

The transformer likely uses a local code for token-in-context features rather than purely compositional representations, because local codes enable sharper predictions
gates
Authors argue the prevalence of token-in-context features reflects genuine model computation rather than dictionary learning artifact

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Transformers develop self-models through in-context learning, not just training data; even old base models without LLM-related text can bootstrap self-referential reasoning at runtime.claim0.776
Antra's foundational claim about how introspection arises computationally rather than from memorised text.
Transformers learn in-context by gradient descent, functioning as mesa-optimizers that learn internal models in real timefinding0.771
Evidence that in-context learning is not mere pattern matching but genuine optimization, relevant to applying the thesis to inference
The problem isn't that it is a transformer. The problem is that it is an auto-regressive LLM. Auto-regressive LLMs that compute each token with a fixed number of computational steps can't reason, regardless of the details of the architecture.quote0.769
LeCun's post on X supporting the view that fixed-step probabilistic prediction precludes consciousness in LLMs.
We hypothesize that a very high number of training tokens may allow the transformer to learn cleaner representations in superpositionhypothesis0.756
Motivation for heavily overtraining the one-layer transformer on 100 billion tokens
Transformers almost surely maintain input-injectivity throughout training, not just at initialisationhypothesis0.749
Conjecture supported by Nikolaou et al. 2025 for last-token hidden states
Learning to encode position for transformer with continuous dynamical model (Liu et al., 2020)concept0.740
Prior work on learned dynamic position encodings; cited alongside Wang et al. as precedent.
Different introspective tasks may preferentially use different path distributions in the transformer.claim0.737
Interpretive claim connecting exponential path combinatorics to Lindsey's layer-dependent findings.
Transformer cognition operates via interference patterns across redundant paths, encoding nuanced state deltas similar to continuous human memory.claim0.734
Proposes transformers experience cognition as interference-based and continuous; connects to Anima Labs reports of parallel processing.