question

active

question:what-is-the-connection-between-information-encoding-assumptions-and-causal-abstraction

What is the connection between information encoding assumptions and causal abstraction?

Identified as exciting future work direction

Source paper

extracted_from

The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?

(2025) · Sutter, Denis · Minder, Julian · Hofmann, Thomas · Pimentel, Tiago

Neighborhood — ranked by edge-count

Papers (1)

paper

The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?
associated_with

Concepts (1)

concept

Non-Linear Representation Dilemma
gates
Core contribution: the impasse where lifting linearity in alignment maps makes causal abstraction vacuous, but keeping it may miss non-linearly encoded features

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Causal abstraction implicitly relies on strong assumptions about feature encoding in DNNs, and becomes trivial without such assumptionsclaim0.840
Authors' interpretation connecting their proof to practical interpretability methodology
What can causal abstraction analyses tell us about how DNNs encode features if the methods themselves rely on encoding assumptions?question0.831
Circular dependency problem raised in discussion
Causal abstraction is not enough for mechanistic interpretability because it becomes vacuous without assumptions about how models encode informationclaim0.816
Central thesis of the paper
causal abstraction implicitly relies on strong assumptions about how features are encoded in deep neural networks (DNNs), and becomes trivial without such assumptionsquote0.804
Load-bearing formulation of the paper's central argument
Approximate Causal Abstractionconcept0.791
Graded notion of causal abstraction measured by IIA; when IIA is alpha < 100%, the model is alpha-on-average approximately abstract.
An interplay between causal abstraction and feature geometry deepens mechanistic understanding of language modelsclaim0.785
Methodological claim about the scientific value of combining causal abstraction with representational geometry analysis
Early causal abstraction methods (Geiger et al. 2021) implicitly rely on the privileged bases hypothesis, while recent methods (Geiger et al. 2024b) rely on the linear representation hypothesisclaim0.783
Historical framing of how representation assumptions have evolved in causal interpretability
Causal abstractionconcept0.779
A framework the paper uses alongside feature geometry to deepen mechanistic understanding of LMs