method
active
method:weight-editingWeight Editing
Editing network weights to test predictions about circuit function; proposed as falsifiability test for circuit claims
Neighborhood — ranked by edge-count
Claims (1)
claim
- Argument that circuits methodology meets natural-science standards of falsifiability
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The space of the model's parameter matrices, where VPD operations take place.
- Technique for modifying model knowledge or behavior via targeted interventions, e.g., ROME by Meng et al.
- Coefficient weighting each task loss in the MTL objective.
- Baseline MTL approach minimizing sum of task losses with equal weights; suffers from task balancing
- Logit weight contributions from a feature that arise due to superposition with other features, not from the feature's own causal role
- Ability to surgically alter model behavior through direct parameter changes rather than activation interventions.
- The other pathway in the 'her' subnetwork, where the verb 'lost' upweights object pronouns (including 'her').