question

active

question:what-should-you-do-if-you-want-to-perform-a-causal-analysis-of-your-dnn

What should you do if you want to perform a causal analysis of your DNN?

Practical question the paper attempts to answer in its conclusion

Source paper

extracted_from

The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?

(2025) · Sutter, Denis · Minder, Julian · Hofmann, Thomas · Pimentel, Tiago

Neighborhood — ranked by edge-count

Concepts (1)

concept

Non-Linear Representation Dilemma
associated_with
Core contribution: the impasse where lifting linearity in alignment maps makes causal abstraction vacuous, but keeping it may miss non-linearly encoded features

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

What can causal abstraction analyses tell us about how DNNs encode features if the methods themselves rely on encoding assumptions?question0.768
Circular dependency problem raised in discussion
How do we establish bidirectional causal relationships between neural systems?question0.754
Motivates the bidirectional design of MAS over unidirectional model stitching.
Causal abstraction implicitly relies on strong assumptions about feature encoding in DNNs, and becomes trivial without such assumptionsclaim0.749
Authors' interpretation connecting their proof to practical interpretability methodology
causal abstraction implicitly relies on strong assumptions about how features are encoded in deep neural networks (DNNs), and becomes trivial without such assumptionsquote0.743
Load-bearing formulation of the paper's central argument
Causal abstraction analysismethod0.733
The formal method used to establish that the identified circuit causally mediates the model's cyclic reasoning behavior
Investigating the causal substructure of neural representations is necessary to avoid misidentifying data structures of simpler representations as abstract conceptsclaim0.732
Motivated by the finding that lexical entailment decomposes into word identities.
Causal Tracingconcept0.731
Mechanistic interpretability technique for locating factual associations, mentioned as future work direction.
Ultimately, we would like to understand neural networks well enough to be able to intentionally design them.quote0.722
Vision statement in the conclusion.