finding
active
finding:das-finds-causal-effect-at-all-training-timesteps-including-when-model-is-just-initialised

DAS finds causal effect at all training timesteps including when model is just initialised

Corroborates Wu et al. 2023 finding that DAS expressivity inflates causal effect estimates

Source paper

extracted_from
CausalGym: Benchmarking causal interpretability methods on linguistic tasks
(2024) · Aryaman Arora · Dan Jurafsky · Christopher Potts

Neighborhood — ranked by edge-count

Findings (1)

finding

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.