question

active

question:can-we-use-these-parameter-subcomponents-to-perform-clean-targeted-changes

can we use these parameter subcomponents to perform clean, targeted changes?

Implicit question driving the editing experiment.

Source paper

extracted_from

cimcWhitepaper

Neighborhood — ranked by edge-count

Findings (1)

finding

Editing the emoticon eye subcomponent to output the unembedding vector for 'o' causes the model to predict shocked faces for all emoticons
answered_by
Direct parameter subcomponent overwrite produces a clean behavioral change without training.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Parameter subcomponents cleanly isolate true mechanisms of the modelclaim0.787
Interpretive claim that the subcomponents correspond to real functional units.
Parameter subcomponentconcept0.772
One of the simple rank-one matrices resulting from VPD that sums with others to reconstruct the original model weights and has a specific functional role.
A good parameter subcomponent is causally important only for specific roles and can be removed from the model without hurting performance on irrelevant promptsclaim0.760
Definitional principle guiding VPD: subcomponents should encode narrow, targeted computational roles rather than distributed, multi-purpose functionality.
are the resulting parameter subcomponents actually interpretable objects?question0.756
First question posed after applying VPD, investigating whether the subcomponents make sense.
Model editing via direct subcomponent overwritemethod0.750
Technique to alter model behavior by directly editing a parameter subcomponent without training, demonstrated by changing an emoticon eye subcomponent.
Direct model editing via parameter subcomponent modification—emoticon eye recognition altered to predict shocked faces with no retrainingfinding0.741
Demonstrated that VPD-discovered subcomponents encode true computational machinery by enabling targeted, predictable behavior changes without gradient-based training.
Attention computations distribute across heads via parameter subcomponents with interpretable rolesfinding0.736
Mechanistic discovery about how attention mechanisms decompose into interpretable parameter components.
Parameters did not change; the binding did.quote0.732
Emphasizes that anchoring is a binding process without weight updates.