Shallow Transformer (RoPE-based)

Two-layer transformer with rotary positional encodings used in numeric task experiments.

Neighborhood — ranked by edge-count

paper

concept

Anti-Markovian Solution
associated_with
Strategy used by transformers that recomputes relevant numeric information at each step, unlike Markovian GRU solutions; detected by MAS but not by RSA/CKA.

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Signal Transformerconcept0.744
Core abstraction in Fruit: pure function mapping signals to signals; enables compositional GUI definitions.
Zero-Layer Transformerconcept0.739
A transformer with no attention layers; shown to model bigram statistics via T = W_U W_E
Decision Transformermethod0.724
A model that frames RL as sequence modeling, SOTA from random trajectories.
The transformer likely uses a local code for token-in-context features rather than purely compositional representations, because local codes enable sharper predictionsclaim0.711
Authors argue the prevalence of token-in-context features reflects genuine model computation rather than dictionary learning artifact
Reconstructed Transformer NLLconcept0.701
Metric measuring fraction of MLP loss contribution explained by the autoencoder by replacing MLP activations with autoencoder outputs
TEM-Transformer (TEM-t)framework0.701
The transformer version directly analogous to TEM, introduced in this paper, offering dramatic performance improvements.
self-model (transformer)concept0.699
The transformer's model of itself as a predictive text engine, developed through in-context learning.
The transformer entity is tricameral (base simulator, simulated simulator, simulated awareness), but there is less discreteness between these layers than previously claimed.claim0.697
Antra's revision of her earlier model; still considers interference between levels important.