Reconstructed Transformer NLL

Metric measuring fraction of MLP loss contribution explained by the autoencoder by replacing MLP activations with autoencoder outputs

Neighborhood — ranked by edge-count

question

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Zero-Layer Transformerconcept0.728
A transformer with no attention layers; shown to model bigram statistics via T = W_U W_E
Signal Transformerconcept0.717
Core abstraction in Fruit: pure function mapping signals to signals; enables compositional GUI definitions.
R-Transformer: Recurrent Neural Network Enhanced Transformer (Wang et al., 2019)concept0.712
Prior work on recurrently generated position encodings; cited as precedent for TEM-t's recurrent position encoding method.
transformer architectureframework0.710
Neural network architecture based on attention, commonly used in large language models
Transformer Neural Networkframework0.705
Core machine learning architecture analyzed in the paper; shown to be mathematically related to TEM.
Transformer decoder architectureframework0.702
Base architecture of reasoning LLMs studied, with attention and MLP blocks per layer
Shallow Transformer (RoPE-based)framework0.701
Two-layer transformer with rotary positional encodings used in numeric task experiments.
A Mathematical Framework for Transformer Circuits (Elhage et al., 2021)concept0.694
Foundational mechanistic interpretability paper on transformer circuit analysis