question

active

question:what-can-causal-abstraction-analyses-tell-us-about-how-dnns-encode-features-if-the-methods-themselves-rely-on-encoding-assumptions

What can causal abstraction analyses tell us about how DNNs encode features if the methods themselves rely on encoding assumptions?

Circular dependency problem raised in discussion

Source paper

extracted_from

The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?

(2025) · Sutter, Denis · Minder, Julian · Hofmann, Thomas · Pimentel, Tiago

Neighborhood — ranked by edge-count

Papers (1)

paper

The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?
associated_with

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Causal abstraction implicitly relies on strong assumptions about feature encoding in DNNs, and becomes trivial without such assumptionsclaim0.914
Authors' interpretation connecting their proof to practical interpretability methodology
causal abstraction implicitly relies on strong assumptions about how features are encoded in deep neural networks (DNNs), and becomes trivial without such assumptionsquote0.897
Load-bearing formulation of the paper's central argument
What is the connection between information encoding assumptions and causal abstraction?question0.831
Identified as exciting future work direction
An interplay between causal abstraction and feature geometry deepens mechanistic understanding of language modelsclaim0.810
Methodological claim about the scientific value of combining causal abstraction with representational geometry analysis
Causal abstraction is not enough for mechanistic interpretability because it becomes vacuous without assumptions about how models encode informationclaim0.801
Central thesis of the paper
Early causal abstraction methods (Geiger et al. 2021) implicitly rely on the privileged bases hypothesis, while recent methods (Geiger et al. 2024b) rely on the linear representation hypothesisclaim0.778
Historical framing of how representation assumptions have evolved in causal interpretability
Assuming linear representations enables identifying the location of certain variables in a DNN, but many insights fail to generalise when more powerful non-linear maps are usedclaim0.771
Interpretive claim about what linear DAS results actually tell us
Investigating the causal substructure of neural representations is necessary to avoid misidentifying data structures of simpler representations as abstract conceptsclaim0.769
Motivated by the finding that lexical entailment decomposes into word identities.