method
active
method:zero-ablation

Zero Ablation

Intervention type that sets activations to zero, used for interpretability analysis

Neighborhood — ranked by edge-count

Artifacts (1)

artifact

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Clamping a feature's value to zero to measure its causal effect on model output.
  • Intervention method that removes a direction from residual stream activations to disrupt corresponding behavior
  • Classical techniques to interrogate regulative capacity of embryos and neural crest by tissue removal or transplantation.
  • Technique used in VPD to enforce mechanistic faithfulness of parameter decompositions.
  • Property requiring that ablating a truth direction shifts model output from truthful to false without other side effects
  • Causal intervention clamping 26 identified OTD latents to zero during steered inference to test ESR contribution
  • An algorithm that determines the marginal effect of n-th order path terms by running the model multiple times with frozen attention patterns and progressively replacing activations
  • Systematic sweep of 10 boost levels from threshold-3σ to threshold+3σ to characterize ESR vs. steering strength