method
active
method:zero-ablationZero Ablation
Intervention type that sets activations to zero, used for interpretability analysis
Neighborhood — ranked by edge-count
Artifacts (1)
artifact
- pyvene open-source Python libraryimplementsThe main artifact introduced in the paper: an open-source PyPI library for customizable interventions on PyTorch models
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Clamping a feature's value to zero to measure its causal effect on model output.
- Intervention method that removes a direction from residual stream activations to disrupt corresponding behavior
- Classical techniques to interrogate regulative capacity of embryos and neural crest by tissue removal or transplantation.
- Technique used in VPD to enforce mechanistic faithfulness of parameter decompositions.
- Property requiring that ablating a truth direction shifts model output from truthful to false without other side effects
- Causal intervention clamping 26 identified OTD latents to zero during steered inference to test ESR contribution
- An algorithm that determines the marginal effect of n-th order path terms by running the model multiple times with frozen attention patterns and progressively replacing activations
- Systematic sweep of 10 boost levels from threshold-3σ to threshold+3σ to characterize ESR vs. steering strength