finding
active
finding:theorem-1-any-algorithm-is-an-input-restricted-distributed-abstraction-of-any-dnn-satisfying-mild-assumptionsTheorem 1: Any algorithm is an input-restricted distributed abstraction of any DNN satisfying mild assumptions
Central theoretical result proving unrestricted causal abstraction is trivial
Source paper
extracted_from(2025) · Sutter, Denis · Minder, Julian · Hofmann, Thomas · Pimentel, Tiago
Neighborhood — ranked by edge-count
Papers (1)
paper
Claims (1)
claim
- Causal abstraction is not enough for mechanistic interpretability because it becomes vacuous without assumptions about how models encode informationassociated_withsupportsCentral thesis of the paper
Concepts (1)
concept
- Core contribution: the impasse where lifting linearity in alignment maps makes causal abstraction vacuous, but keeping it may miss non-linearly encoded features
Findings (1)
finding
- Confirms theorem's existence proof holds but practical learnability fails with insufficient RevNet capacity
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Authors' interpretation connecting their proof to practical interpretability methodology
- DAS reveals that the neural network encodes abstract relational structure rather than raw input identities.
- Load-bearing formulation of the paper's central argument
- Identification of algorithms implemented in attention layers, distributed across attention headsfinding0.752VPD successfully recovered interpretable attention algorithms (previous-token behavior, syntax-boundary routing) in weight space without requiring manual decomposition across heads.
- Circular dependency problem raised in discussion
- Claim supported by VPD's recovery of cross-head attention subcomponents, noted in footnote.
- Parlog's merge process for client-server is unnecessarily complex; Linda's tuple operations remain flexible across problem variants.
- Corroborating result on additional task confirming main paper findings