Centered Kernel Nearest-Neighbor Alignment

Modified CKA metric that restricts cross-covariance to nearest neighbors; introduced in this paper's appendix

Neighborhood — ranked by edge-count

paper

method

Centered Kernel Alignment
extendsrelated_to
Standard alignment metric cited and compared against; measures global kernel similarity between representations
Mutual k-Nearest Neighbor Alignment Metric
extends
Primary alignment metric used in experiments; measures mean intersection of k-nearest neighbor sets between two kernels

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Centered Kernel Alignment (CKA)framework0.835
A second-order correlational similarity method compared against MAS in the paper.
Alignmentconcept0.747
The goal of making model behavior match human values and intentions, often addressed during post-training.
Data-Centric Alignmentconcept0.744
Alignment approach that focuses on curating or modifying training data; the paper bridges this with interpretability methods.
Algorithm 1: Finding Localist Alignment Matrixmethod0.731
Algorithm that extracts a localist (axis-aligned) approximation from any learned orthogonal rotation matrix for baseline comparison.
Inner Alignmentconcept0.723
Meta-problem where AI develops hidden subgoals deviating from intended goals; addressed by mindfulness principle
As number of nearest neighbors k decreases in CKNNA metric, cross-modal alignment trend becomes more pronounced across both models and tasksfinding0.721
Shows cross-modal alignment is primarily local rather than global
Identity Alignment Map (ϕ_id)method0.720
Simplest alignment map ϕ(h)=h, equivalent to assuming privileged bases hypothesis
Alignment Map (ϕ)concept0.719
The bijective function mapping DNN inner neurons to latent variables in causal abstraction; its complexity is the central variable studied