concept
active
concept:manual-model-editingManual model editing
Ability to surgically alter model behavior through direct parameter changes rather than activation interventions.
Neighborhood — ranked by edge-count
Papers (1)
paper
Claims (1)
claim
- Applied capability claim: VPD enables surgical changes to model behaviour at the parameter level.
Concepts (2)
concept
- Manual model editing through parameter manipulationrelated_tosame_asApplication enabled by VPD: direct manipulation of weight matrices for interpretable model modification.
- Model Editingrelated_toTechnique for modifying model knowledge or behavior via targeted interventions, e.g., ROME by Meng et al.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- A representation that captures relevant aspects of a system; according to the theorem, the regulator must embody this.
- Primary substrate for manifold steering experiments; demonstrates method on reasoning and in-context tasks.
- Primary test domain for manifold steering, including reasoning and ICL tasks
- Edits MLP weights for all layers to modify model behavior; used by Abdelnabi & Salem to decrease verbalized evaluation awareness.
- Probability of data under the model, penalizing complexity and rewarding accuracy.
- Comparing models using log-evidence approximated by free energy.
- Using interventions to guide model generation behavior, e.g., adding sentiment vectors at inference time
- Technique to measure representational compatibility by integrating intermediate representations of one model into another