claim
active
claim:mlp-layers-are-much-harder-to-get-traction-on-than-attention-layers-understanding-them-requires-individually-interpretable-neurons-which-are-rarely-found

MLP layers are much harder to get traction on than attention layers; understanding them requires individually interpretable neurons which are rarely found

Key limitation of the paper's approach; MLP layers make up 2/3 of standard transformer parameters

Source paper

extracted_from
A Mathematical Framework for Transformer Circuits
(2021) ·

Neighborhood — ranked by edge-count

Concepts (2)

concept
  • Neurons that respond to multiple unrelated concepts, limiting interpretability.
  • The nonlinear activation function used in MLP layers; prevents the linearization approach used for attention layers from extending to MLP layers

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.