finding
active
finding:non-linear-alignment-map-nonlin-achieves-near-optimal-iia-across-all-layers-on-hierarchical-equality-task-eliminating-layer-dependent-degradation-seen-with-linear-mapsNon-linear alignment map ϕ_nonlin achieves near-optimal IIA across all layers on hierarchical equality task, eliminating layer-dependent degradation seen with linear maps
Key empirical result: non-linear maps overcome linear maps' failure in deeper layers
Source paper
extracted_from(2025) · Sutter, Denis · Minder, Julian · Hofmann, Thomas · Pimentel, Tiago
Neighborhood — ranked by edge-count
Claims (1)
claim
- Central thesis of the paper
Findings (1)
finding
- Replicates Geiger et al. 2024b pattern of layer-dependent IIA degradation with linear maps
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Alignment map implemented as a reversible residual network (RevNet); assumes non-linear representation hypothesis
- Demonstrates that high IIA can be obtained even when model cannot solve the task
- Corroborating result on additional task confirming main paper findings
- Alignment map ϕ(h)=W_orth*h using orthogonal matrix; assumes linear representation hypothesis
- Linear alignment map ϕ_lin IIA tracks DNN accuracy during Pythia-410m training progression on IOI taskfinding0.798Suggests linear maps may be better correlated with genuine task implementation than non-linear maps
- Authors connect their finding to the prior probing literature debate
- Best localist alignment achieves IIA of 0.73 on hierarchical equality Both Equality Relations in Layer 1finding0.787Shows localist alignment fails to capture the distributed structure found by DAS.
- Shows high IIA on random models depends on entity overlap; generalisation is essential for genuine interpretation