question
active
question:are-the-resulting-parameter-subcomponents-actually-interpretable-objectsare the resulting parameter subcomponents actually interpretable objects?
First question posed after applying VPD, investigating whether the subcomponents make sense.
Source paper
extracted_fromNeighborhood — ranked by edge-count
Findings (1)
finding
- Specific discovered subcomponent that activates on punctuation like ' :', ' ;', ' =', ':-' and predicts the rest of emoticons/emojis.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- One of the simple rank-one matrices resulting from VPD that sums with others to reconstruct the original model weights and has a specific functional role.
- Attention computations distribute across heads via parameter subcomponents with interpretable rolesfinding0.783Mechanistic discovery about how attention mechanisms decompose into interpretable parameter components.
- Interpretive claim that the subcomponents correspond to real functional units.
- Assertion about the qualitative advantages of VPD's rank-one decomposition.
- Definitional principle guiding VPD: subcomponents should encode narrow, targeted computational roles rather than distributed, multi-purpose functionality.
- Implicit question driving the editing experiment.
- Motivates shift from studying model activations ('thoughts') to understanding parameters ('the computations themselves').
- Motivating claim that mechanistic explanations add clinical value for VUS.