method
active
method:causal-scrubbing

Causal Scrubbing

Method by Chan et al. 2022 for rigorously testing interpretability hypotheses via interventions

Neighborhood — ranked by edge-count

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Causal Tracingconcept0.804
    Mechanistic interpretability technique for locating factual associations, mentioned as future work direction.
  • causal bypassingconcept0.788
    Confound where naming injected concepts reflects direct logit effects rather than metacognitive awareness, raised by Morris & Plunkett
  • causal maskingconcept0.788
    Attention restricted to previous tokens only, as in decoder-only models; leads to AR(ω)-like behaviour and no ordered phase
  • The structural-realist grounding for self-evidencing after the bounded self is relinquished.
  • Causal Mediationconcept0.780
    Whether an internal direction causally controls a target behavior, verified by intervention success
  • Causal Decouplingconcept0.769
    Emergent causation where macro-variable has causal influence on its own future independently of micro-states.
  • Causal abstractionconcept0.768
    A framework the paper uses alongside feature geometry to deepen mechanistic understanding of LMs
  • Causal Invarianceconcept0.764
    Property that causal mechanisms remain stable across environments; desirable for OOD.