question
active
question:when-and-how-can-mlp-neurons-in-transformers-be-individually-interpreted-and-what-progress-is-needed-to-extend-mechanistic-interpretability-to-themWhen and how can MLP neurons in transformers be individually interpreted, and what progress is needed to extend mechanistic interpretability to them?
Major open problem identified in the paper; MLP layers constitute 2/3 of transformer parameters
Neighborhood — ranked by edge-count
Papers (1)
paper
- A Mathematical Framework for Transformer Circuitsassociated_with
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The sparse set of 28 neurons at layer 18 identified as responsible for Fourier feature computation across all cyclic tasks
- Key limitation of the paper's approach; MLP layers make up 2/3 of standard transformer parameters
- Claim from footnote 3, acknowledging neuron-level interpretability while arguing subcomponents are better.
- Structural finding showing modular organization within the sparse neuron set
- Hypothesis based on observed negative cosine similarity between input and output weights of some neurons
- Core summary of Janus' position on autoregressive recurrence enabling introspection.
- Antra's foundational claim about how introspection arises computationally rather than from memorised text.
- Feed-forward neural network with hidden layers, capable of representing non-linearly separable functions.