Zero Ablation

Intervention type that sets activations to zero, used for interpretability analysis

Neighborhood — ranked by edge-count

artifact

pyvene open-source Python library
implements
The main artifact introduced in the paper: an open-source PyPI library for customizable interventions on PyTorch models

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Feature ablation (zeroing feature activations)method0.822
Clamping a feature's value to zero to measure its causal effect on model output.
Directional Ablationmethod0.805
Intervention method that removes a direction from residual stream activations to disrupt corresponding behavior
Grafting and Ablationmethod0.794
Classical techniques to interrogate regulative capacity of embryos and neural crest by tissue removal or transplantation.
Adversarial ablationmethod0.781
Technique used in VPD to enforce mechanistic faithfulness of parameter decompositions.
Surgical Ablation Propertyconcept0.762
Property requiring that ablating a truth direction shifts model output from truthful to false without other side effects
Off-Topic Detector Latent Ablationmethod0.760
Causal intervention clamping 26 identified OTD latents to zero during steered inference to test ESR contribution
Term Importance Analysis via Ablationmethod0.754
An algorithm that determines the marginal effect of n-th order path terms by running the model multiple times with frozen attention patterns and progressively replacing activations
Boost Level Ablation Sweepmethod0.742
Systematic sweep of 10 boost levels from threshold-3σ to threshold+3σ to characterize ESR vs. steering strength