method
active
method:term-importance-analysis-via-ablationTerm Importance Analysis via Ablation
An algorithm that determines the marginal effect of n-th order path terms by running the model multiple times with frozen attention patterns and progressively replacing activations
Neighborhood — ranked by edge-count
Papers (1)
paper
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Intervention type that sets activations to zero, used for interpretability analysis
- Intervention method that removes a direction from residual stream activations to disrupt corresponding behavior
- Gradient-based attribution approximates ablation impact, enabling fast search for causally important features.
- Classical techniques to interrogate regulative capacity of embryos and neural crest by tissue removal or transplantation.
- Property requiring that ablating a truth direction shifts model output from truthful to false without other side effects
- A measure of whether a subcomponent is necessary to reproduce model behavior on a specific prompt, predicted by the causal importance network.
- Clamping a feature's value to zero to measure its causal effect on model output.
- Weighted Spearman correlation that corrects for sampling bias in automated interpretability evaluation