claim
active
claim:mlp-layers-are-much-harder-to-get-traction-on-than-attention-layers-understanding-them-requires-individually-interpretable-neurons-which-are-rarely-foundMLP layers are much harder to get traction on than attention layers; understanding them requires individually interpretable neurons which are rarely found
Key limitation of the paper's approach; MLP layers make up 2/3 of standard transformer parameters
Neighborhood — ranked by edge-count
Papers (1)
paper
Concepts (2)
concept
- PolysemanticitysupportsNeurons that respond to multiple unrelated concepts, limiting interpretability.
- GeLU Activation FunctionsupportsThe nonlinear activation function used in MLP layers; prevents the linearization approach used for attention layers from extending to MLP layers
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Feed-forward neural network with hidden layers, capable of representing non-linearly separable functions.
- A sparse set of 28 MLP neurons at layer 18 (~0.2% of MLP) are reused across all cyclic tasksfinding0.782Quantitative finding identifying the specific neurons responsible for generic addition
- Question about completeness of feature-based model explanation
- Structural finding showing modular organization within the sparse neuron set
- What are the specific attention heads or MLP neurons (circuits) responsible for self-reflection in LLMs?question0.770Future research question about pinpointing fine-grained mechanistic components of reflection.
- Hypothesis based on observed negative cosine similarity between input and output weights of some neurons
- Major open problem identified in the paper; MLP layers constitute 2/3 of transformer parameters
- Precise characterization of why polysemanticity poses a combinatorial obstacle to circuit analysis