Model editing via direct subcomponent overwrite

Technique to alter model behavior by directly editing a parameter subcomponent without training, demonstrated by changing an emoticon eye subcomponent.

Neighborhood — ranked by edge-count

Papers (1)

paper

Interpreting Language Model Parameters
introduces

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Model Editingconcept0.779
Technique for modifying model knowledge or behavior via targeted interventions, e.g., ROME by Meng et al.
Direct model editing via parameter subcomponent modification—emoticon eye recognition altered to predict shocked faces with no retrainingfinding0.770
Demonstrated that VPD-discovered subcomponents encode true computational machinery by enabling targeted, predictable behavior changes without gradient-based training.
Manual model editing through parameter manipulationconcept0.763
Application enabled by VPD: direct manipulation of weight matrices for interpretable model modification.
can we use these parameter subcomponents to perform clean, targeted changes?question0.750
Implicit question driving the editing experiment.
Manual model editingconcept0.748
Ability to surgically alter model behavior through direct parameter changes rather than activation interventions.
Parameter subcomponents cleanly isolate true mechanisms of the modelclaim0.727
Interpretive claim that the subcomponents correspond to real functional units.
Parameter subcomponentconcept0.726
One of the simple rank-one matrices resulting from VPD that sums with others to reconstruct the original model weights and has a specific functional role.
All models exhibit above-baseline representation of the think word when instructed to think about itfinding0.718
In the intentional control experiment, all tested models show above-zero cosine similarity to the think word's concept vector.