method
active
method:localist-alignment-baselineLocalist Alignment Baseline
Baseline that finds the axis-aligned orthogonal matrix closest to the learned distributed rotation, assuming disjoint neuron groups.
Neighborhood — ranked by edge-count
Methods (1)
method
- Algorithm that extracts a localist (axis-aligned) approximation from any learned orthogonal rotation matrix for baseline comparison.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Best localist alignment achieves IIA of 0.73 on hierarchical equality Both Equality Relations in Layer 1finding0.793Shows localist alignment fails to capture the distributed structure found by DAS.
- Localist methods fail entirely on MoNLI distributed representations.
- Prior assumption that high-level variables align with disjoint groups of neurons in standard basis; contrasted with distributed representations.
- Measure of similarity between the similarity structures (kernels) induced by two different representations
- A mapping assigning to each high-level variable a set of low-level variables and a function from low-level to high-level values.
- Alignment map ϕ(h)=W_orth*h using orthogonal matrix; assumes linear representation hypothesis
- Modified CKA metric that restricts cross-covariance to nearest neighbors; introduced in this paper's appendix
- The goal of making model behavior match human values and intentions, often addressed during post-training.