David E. Rumelhart

PDP researcher; foundational work on distributed representations cited in DAS motivation.

Authored

Introduces

Studies

Affiliations

Cited by

Authored papers (1)

Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations2023ⓒ 9
Distributed alignment search (DAS) resolves two blocking limitations of prior causal abstraction work—brute-force alignment search and the localist assumption that high-level variables map to disjoint neuron sets—by using gradient descent over orthogonal rotation matrices to find alignments in non-standard bases of neural representations. On a hierarchical equality task, a three-layer feed-forward network with hidden size 16 achieves 100% interchange intervention accuracy (IIA) under DAS at layer 1 with an 8-dimensional intervention subspace, whereas the best brute-force localist search reaches only 0.60 IIA and the closest localist alignment only 0.73 IIA. On the Monotonicity NLI benchmark, BERT-base fine-tuned on MoNLI achieves 100% IIA at layer 9 when 256 non-standard basis dimensions of the [CLS] token encode lexical entailment and 256 others encode negation, while no localist alignment exceeds 0.51 IIA on the same task. A subsequent subspace decomposition reveals a structural asymmetry: the hierarchical equality representations of w=x and y=z cannot be decomposed into representations of individual input identities (subspace DAS IIA ≈ 0.50–0.51), whereas the apparent lexical-entailment representation in BERT decomposes almost perfectly (IIA ≈ 0.97–0.98) into two word-identity representations. DAS implies that previous negative or weak causal abstraction findings may have been artifacts of the localist assumption, and that neural networks can genuinely implement tree-structured symbolic algorithms—but that apparent relational representations may sometimes be data structures over entity identities rather than true relational encodings.

David E. Rumelhart

Authored papers (1)

More papers — OpenAlex / S2

Co-authors (8)

Their work is cited by (6)

Recent mentions (1)