finding

active

finding:localist-alignment-achieves-0-51-iia-on-monli-tasks-near-chance-performance

Localist alignment achieves ~0.51 IIA on MoNLI tasks, near chance performance

Localist methods fail entirely on MoNLI distributed representations.

Source paper

extracted_from

Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations

(2023) · Atticus Geiger · Zhengxuan Wu · Christopher Potts · Thomas Icard +1

Neighborhood — ranked by edge-count

Claims (1)

claim

DAS overcomes the localist limitation of prior causal abstraction by allowing individual neurons to play multiple roles via non-standard bases
supports
Central claim motivating DAS over prior methods.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Best localist alignment achieves IIA of 0.73 on hierarchical equality Both Equality Relations in Layer 1finding0.833
Shows localist alignment fails to capture the distributed structure found by DAS.
Localist Alignment Baselinemethod0.793
Baseline that finds the axis-aligned orthogonal matrix closest to the learned distributed rotation, assuming disjoint neuron groups.
Over 80% IIA achieved using complex non-linear alignment maps on randomly initialised MLPs in hierarchical equality taskfinding0.777
Demonstrates that high IIA can be obtained even when model cannot solve the task
Brute-force search achieves maximum IIA of 0.60 on MoNLI tasksfinding0.747
DAS substantially outperforms brute-force search on MoNLI across all models.
Near-perfect IIA can be achieved on randomly initialised models that cannot solve the task, suggesting causal alignment does not require task capabilityclaim0.747
Empirical support for vacuousness of unrestricted causal abstraction
Is a mutual nearest-neighbor alignment score of 0.16 indicative of strong alignment with remaining gap being noise, or does it signify poor alignment with major differences left to explain?question0.743
Open question the authors leave unresolved about interpreting the magnitude of their alignment measurements
Cross-modal language-vision alignment reaches a maximum of approximately 0.16 on mutual nearest-neighbor metric in Figure 3, well below the theoretical maximum of 1finding0.743
Quantitative bound on observed alignment; raises the open question of whether this gap reflects noise or real misalignment
Algorithm 1: Finding Localist Alignment Matrixmethod0.740
Algorithm that extracts a localist (axis-aligned) approximation from any learned orthogonal rotation matrix for baseline comparison.