finding

active

finding:best-localist-alignment-achieves-iia-of-0-73-on-hierarchical-equality-both-equality-relations-in-layer-1

Best localist alignment achieves IIA of 0.73 on hierarchical equality Both Equality Relations in Layer 1

Shows localist alignment fails to capture the distributed structure found by DAS.

Source paper

extracted_from

Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations

(2023) · Atticus Geiger · Zhengxuan Wu · Christopher Potts · Thomas Icard +1

Neighborhood — ranked by edge-count

Claims (1)

claim

DAS overcomes the localist limitation of prior causal abstraction by allowing individual neurons to play multiple roles via non-standard bases
supports
Central claim motivating DAS over prior methods.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Localist alignment achieves ~0.51 IIA on MoNLI tasks, near chance performancefinding0.833
Localist methods fail entirely on MoNLI distributed representations.
Identity of first argument algorithm IIA consistently hovers around 50% for all alignment map types on hierarchical equality taskfinding0.807
Exception to the general trend; attributed to insufficient RevNet capacity rather than algorithm not being implemented
Linear alignment map ϕ_lin shows substantial IIA decrease in third layer for both equality relations and left equality relation algorithms in hierarchical equality taskfinding0.804
Replicates Geiger et al. 2024b pattern of layer-dependent IIA degradation with linear maps
Brute-force search achieves best IIA of 0.60 on hierarchical equality Both Equality Relations in Layer 1finding0.795
DAS substantially outperforms brute-force search (1.00 vs 0.60 IIA) on the hierarchical equality task.
Localist Alignment Baselinemethod0.793
Baseline that finds the axis-aligned orthogonal matrix closest to the learned distributed rotation, assuming disjoint neuron groups.
Over 80% IIA achieved using complex non-linear alignment maps on randomly initialised MLPs in hierarchical equality taskfinding0.792
Demonstrates that high IIA can be obtained even when model cannot solve the task
Non-linear alignment map ϕ_nonlin achieves near-optimal IIA across all layers on hierarchical equality task, eliminating layer-dependent degradation seen with linear mapsfinding0.787
Key empirical result: non-linear maps overcome linear maps' failure in deeper layers
DAS achieves 100% IIA on hierarchical equality task with |N|=16, intervention size 8, Layer 1finding0.770
DAS discovers a perfect alignment between the feed-forward network and the Both Equality Relations high-level model.