Causal Scrubbing

Method by Chan et al. 2022 for rigorously testing interpretability hypotheses via interventions

Neighborhood — ranked by edge-count

paper

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Causal Tracingconcept0.804
Mechanistic interpretability technique for locating factual associations, mentioned as future work direction.
causal bypassingconcept0.788
Confound where naming injected concepts reflects direct logit effects rather than metacognitive awareness, raised by Morris & Plunkett
causal maskingconcept0.788
Attention restricted to previous tokens only, as in decoder-only models; leads to AR(ω)-like behaviour and no ordered phase
causal regularitiesconcept0.781
The structural-realist grounding for self-evidencing after the bounded self is relinquished.
Causal Mediationconcept0.780
Whether an internal direction causally controls a target behavior, verified by intervention success
Causal Decouplingconcept0.769
Emergent causation where macro-variable has causal influence on its own future independently of micro-states.
Causal abstractionconcept0.768
A framework the paper uses alongside feature geometry to deepen mechanistic understanding of LMs
Causal Invarianceconcept0.764
Property that causal mechanisms remain stable across environments; desirable for OOD.