finding

active

finding:theorem-1-any-algorithm-is-an-input-restricted-distributed-abstraction-of-any-dnn-satisfying-mild-assumptions

Theorem 1: Any algorithm is an input-restricted distributed abstraction of any DNN satisfying mild assumptions

Central theoretical result proving unrestricted causal abstraction is trivial

Source paper

extracted_from

The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?

(2025) · Sutter, Denis · Minder, Julian · Hofmann, Thomas · Pimentel, Tiago

Neighborhood — ranked by edge-count

Papers (1)

paper

The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?
introduces

Claims (1)

claim

Causal abstraction is not enough for mechanistic interpretability because it becomes vacuous without assumptions about how models encode information
associated_withsupports
Central thesis of the paper

Concepts (1)

concept

Non-Linear Representation Dilemma
supports
Core contribution: the impasse where lifting linearity in alignment maps makes causal abstraction vacuous, but keeping it may miss non-linearly encoded features

Findings (1)

finding

With only 1,000 training samples, ϕ_nonlin achieves IIA over 0.99 on training set for identity of first argument algorithm, but fails at scale
supports
Confirms theorem's existence proof holds but practical learnability fails with insufficient RevNet capacity

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Causal abstraction implicitly relies on strong assumptions about feature encoding in DNNs, and becomes trivial without such assumptionsclaim0.777
Authors' interpretation connecting their proof to practical interpretability methodology
The feed-forward network truly implements a symbolic, tree-structured algorithm for hierarchical equality, with abstract equality relations not decomposable into input identitiesclaim0.762
DAS reveals that the neural network encodes abstract relational structure rather than raw input identities.
causal abstraction implicitly relies on strong assumptions about how features are encoded in deep neural networks (DNNs), and becomes trivial without such assumptionsquote0.757
Load-bearing formulation of the paper's central argument
Identification of algorithms implemented in attention layers, distributed across attention headsfinding0.752
VPD successfully recovered interpretable attention algorithms (previous-token behavior, syntax-boundary routing) in weight space without requiring manual decomposition across heads.
What can causal abstraction analyses tell us about how DNNs encode features if the methods themselves rely on encoding assumptions?question0.742
Circular dependency problem raised in discussion
Attention algorithms are usually distributed across attention headsclaim0.736
Claim supported by VPD's recovery of cross-head attention subcomponents, noted in footnote.
Simple problems require systems that expose simple solutions; forcing complex solutions signals misaligned abstraction level.claim0.734
Parlog's merge process for client-server is unnecessarily complex; Linda's tuple operations remain flexible across problem variants.
Non-linear ϕ_nonlin achieves near-perfect IIA on distributive law task for both And-Or and And-Or-And algorithms, eliminating linear/identity map differencesfinding0.734
Corroborating result on additional task confirming main paper findings