claim
active
claim:feature-attribution-correlates-well-with-ablation-effects-making-it-an-efficient-proxy-for-causal-effectFeature attribution correlates well with ablation effects, making it an efficient proxy for causal effect.
Gradient-based attribution approximates ablation impact, enabling fast search for causally important features.
Source paper
extracted_fromNeighborhood — ranked by edge-count
Findings (1)
finding
- Validation of attribution as a fast proxy for causal importance.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Authors' central interpretive assertion that their method meaningfully mitigates unwanted behaviors.
- Stronger version: all cognition attributions rely on observable behavior.
- Conceptual framing: integrates mechanistic interpretability tools with alignment-focused data curation.
- Replication of Wu et al. 2023 finding; DAS expressivity concern validated in CausalGym setup
- Clamping a feature's value to zero to measure its causal effect on model output.
- Authors' interpretation connecting their proof to practical interpretability methodology
- Methodological claim about the scientific value of combining causal abstraction with representational geometry analysis