method
active
method:weight-editing

Weight Editing

Editing network weights to test predictions about circuit function; proposed as falsifiability test for circuit claims

Neighborhood — ranked by edge-count

Claims (1)

claim

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Weight spaceconcept0.768
    The space of the model's parameter matrices, where VPD operations take place.
  • Model Editingconcept0.762
    Technique for modifying model knowledge or behavior via targeted interventions, e.g., ROME by Meng et al.
  • Task weightconcept0.755
    Coefficient weighting each task loss in the MTL objective.
  • Equal Weightingframework0.755
    Baseline MTL approach minimizing sum of task losses with equal weights; suffers from task balancing
  • Logit weight contributions from a feature that arise due to superposition with other features, not from the feature's own causal role
  • Ability to surgically alter model behavior through direct parameter changes rather than activation interventions.
  • Structure Editormethod0.716
  • The other pathway in the 'her' subnetwork, where the verb 'lost' upweights object pronouns (including 'her').