question
active
question:how-much-of-the-causal-effect-found-by-das-is-due-to-its-expressivity-rather-than-any-aspect-of-the-representation-being-studiedHow much of the causal effect found by DAS is due to its expressivity rather than any aspect of the representation being studied?
Core methodological question motivating the introduction of selectivity and control tasks
Source paper
extracted_from(2024) · Aryaman Arora · Dan Jurafsky · Christopher Potts
Neighborhood — ranked by edge-count
Papers (1)
paper
Findings (1)
finding
- Probe achieves selectivity of 4.20 on pythia-410m, slightly exceeding DAS selectivity of 3.96answered_byKey result showing that for models larger than pythia-70m, probe selectivity matches or exceeds DAS selectivity
Claims (1)
claim
- Author interpretation of selectivity results showing DAS advantage diminishes when controlling for expressivity
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Replication of Wu et al. 2023 finding; DAS expressivity concern validated in CausalGym setup
- DAS finds causal effect at all training timesteps including when model is just initialisedfinding0.825Corroborates Wu et al. 2023 finding that DAS expressivity inflates causal effect estimates
- Central thesis of the paper
- Central claim motivating DAS over prior methods.
- Gradient-based attribution approximates ablation impact, enabling fast search for causally important features.
- Authors' interpretation connecting their proof to practical interpretability methodology
- Methodological claim about the scientific value of combining causal abstraction with representational geometry analysis
- Supported by the finding that non-trivial rotations are required to find aligned representations.