concept
active
concept:reconstructed-transformer-nllReconstructed Transformer NLL
Metric measuring fraction of MLP loss contribution explained by the autoencoder by replacing MLP activations with autoencoder outputs
Neighborhood — ranked by edge-count
Questions (1)
question
- to what extent do interpretable features represent the 'full story' of the MLP layer?associated_withQuestion about completeness of feature-based model explanation
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- A transformer with no attention layers; shown to model bigram statistics via T = W_U W_E
- Core abstraction in Fruit: pure function mapping signals to signals; enables compositional GUI definitions.
- Prior work on recurrently generated position encodings; cited as precedent for TEM-t's recurrent position encoding method.
- Neural network architecture based on attention, commonly used in large language models
- Core machine learning architecture analyzed in the paper; shown to be mathematically related to TEM.
- Base architecture of reasoning LLMs studied, with attention and MLP blocks per layer
- Two-layer transformer with rotary positional encodings used in numeric task experiments.
- Foundational mechanistic interpretability paper on transformer circuit analysis