Weight Editing

Editing network weights to test predictions about circuit function; proposed as falsifiability test for circuit claims

Neighborhood — ranked by edge-count

claim

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Weight spaceconcept0.768
The space of the model's parameter matrices, where VPD operations take place.
Model Editingconcept0.762
Technique for modifying model knowledge or behavior via targeted interventions, e.g., ROME by Meng et al.
Task weightconcept0.755
Coefficient weighting each task loss in the MTL objective.
Equal Weightingframework0.755
Baseline MTL approach minimizing sum of task losses with equal weights; suffers from task balancing
Interference Weightsconcept0.722
Logit weight contributions from a feature that arise due to superposition with other features, not from the feature's own causal role
Manual model editingconcept0.718
Ability to surgically alter model behavior through direct parameter changes rather than activation interventions.
Structure Editormethod0.716
Object pronoun upweightingconcept0.715
The other pathway in the 'her' subnetwork, where the verb 'lost' upweights object pronouns (including 'her').