question
active
question:when-and-how-can-mlp-neurons-in-transformers-be-individually-interpreted-and-what-progress-is-needed-to-extend-mechanistic-interpretability-to-them

When and how can MLP neurons in transformers be individually interpreted, and what progress is needed to extend mechanistic interpretability to them?

Major open problem identified in the paper; MLP layers constitute 2/3 of transformer parameters

Source paper

extracted_from
A Mathematical Framework for Transformer Circuits
(2021) ·

Neighborhood — ranked by edge-count

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.