claim

active

claim:transformers-use-an-anti-markovian-solution-that-recomputes-relevant-numeric-information-at-each-step-in-the-multi-object-task

Transformers use an anti-Markovian solution that recomputes relevant numeric information at each step in the Multi-Object task

Prior finding from Grant et al. 2025 used to interpret low MAS IIA for GRU-Transformer hidden state comparisons.

Source paper

extracted_from

Model Alignment Search

(2025) · Satchel Grant

Neighborhood — ranked by edge-count

Findings (1)

finding

MAS IIA is low for GRU hidden states vs Transformer hidden states on Multi-Object task, consistent with anti-Markovian transformer solution
supports
Validates MAS as a causal detector of representational differences invisible to correlative methods.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Transformers learn in-context by gradient descent, functioning as mesa-optimizers that learn internal models in real timefinding0.797
Evidence that in-context learning is not mere pattern matching but genuine optimization, relevant to applying the thesis to inference
Transformers are recurrent through autoregression because K/V stream provides horizontal information flow across positions.claim0.792
Claim formalizing the Anima Labs idea that transformers are effectively recurrent due to K/V stream.
Transformers develop self-models through in-context learning, not just training data; even old base models without LLM-related text can bootstrap self-referential reasoning at runtime.claim0.776
Antra's foundational claim about how introspection arises computationally rather than from memorised text.
Transformers almost surely maintain input-injectivity throughout training, not just at initialisationhypothesis0.774
Conjecture supported by Nikolaou et al. 2025 for last-token hidden states
Different introspective tasks may preferentially use different path distributions in the transformer.claim0.772
Interpretive claim connecting exponential path combinatorics to Lindsey's layer-dependent findings.
Anti-Markovian Solutionconcept0.769
Strategy used by transformers that recomputes relevant numeric information at each step, unlike Markovian GRU solutions; detected by MAS but not by RSA/CKA.
Redundant information paths create interference patterns, so transformers likely experience memory and cognition as interferometric and continuous.claim0.767
Janus's claim linking path redundancy to interferometric phenomenology.
Two-layer attention-only transformers implement much more complex algorithms via composition of attention heads, detectable directly from weightsclaim0.755
Core claim for two-layer models; composition creates qualitatively more powerful in-context learning