claim
active
claim:some-mlp-neurons-and-attention-heads-perform-memory-management-by-reading-residual-stream-information-and-writing-its-negative-to-delete-itSome MLP neurons and attention heads perform memory management by reading residual stream information and writing its negative to delete it
Hypothesis based on observed negative cosine similarity between input and output weights of some neurons
Neighborhood — ranked by edge-count
Papers (1)
paper
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- What are the specific attention heads or MLP neurons (circuits) responsible for self-reflection in LLMs?question0.817Future research question about pinpointing fine-grained mechanistic components of reflection.
- MLP neurons and attention heads hypothesized to delete information from the residual stream by writing the negative of what they read
- Interesting special case of copying behavior related to tokenization artifacts; primitive precursor to induction heads
- Key limitation of the paper's approach; MLP layers make up 2/3 of standard transformer parameters
- A sparse set of 28 MLP neurons at layer 18 (~0.2% of MLP) are reused across all cyclic tasksfinding0.763Quantitative finding identifying the specific neurons responsible for generic addition
- Mathematical equivalence enabling independent analysis of each attention head
- Empirical observation from examining expanded OV/QK matrices; approximately 10 out of 12 heads show significant copying
- Major open problem identified in the paper; MLP layers constitute 2/3 of transformer parameters