framework
active
framework:shallow-transformer-rope-basedShallow Transformer (RoPE-based)
Two-layer transformer with rotary positional encodings used in numeric task experiments.
Neighborhood — ranked by edge-count
Papers (1)
paper
- Model Alignment Searchmentions
Concepts (1)
concept
- Anti-Markovian Solutionassociated_withStrategy used by transformers that recomputes relevant numeric information at each step, unlike Markovian GRU solutions; detected by MAS but not by RSA/CKA.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Core abstraction in Fruit: pure function mapping signals to signals; enables compositional GUI definitions.
- A transformer with no attention layers; shown to model bigram statistics via T = W_U W_E
- A model that frames RL as sequence modeling, SOTA from random trajectories.
- Authors argue the prevalence of token-in-context features reflects genuine model computation rather than dictionary learning artifact
- Metric measuring fraction of MLP loss contribution explained by the autoencoder by replacing MLP activations with autoencoder outputs
- The transformer version directly analogous to TEM, introduced in this paper, offering dramatic performance improvements.
- The transformer's model of itself as a predictive text engine, developed through in-context learning.
- Antra's revision of her earlier model; still considers interference between levels important.