Centered Kernel Alignment

Standard alignment metric cited and compared against; measures global kernel similarity between representations

Neighborhood — ranked by edge-count

paper

thinker

Kornblith et al.
introduces
Introduced CKA and observed model alignment increases with model scale and dataset size

finding

concept

Representational Alignment
implements
Measure of similarity between the similarity structures (kernels) induced by two different representations

method

Centered Kernel Nearest-Neighbor Alignment
extendsrelated_to
Modified CKA metric that restricts cross-covariance to nearest neighbors; introduced in this paper's appendix
Mutual k-Nearest Neighbor Alignment Metric
extends
Primary alignment metric used in experiments; measures mean intersection of k-nearest neighbor sets between two kernels

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Centered Kernel Alignment (CKA)framework0.881
A second-order correlational similarity method compared against MAS in the paper.
Alignmentconcept0.759
The goal of making model behavior match human values and intentions, often addressed during post-training.
Data-Centric Alignmentconcept0.753
Alignment approach that focuses on curating or modifying training data; the paper bridges this with interpretability methods.
Inner Alignmentconcept0.747
Meta-problem where AI develops hidden subgoals deviating from intended goals; addressed by mindfulness principle
Centerednessconcept0.734
The defining mark of a center: the appearance of being a focal zone within a larger whole.
Inner alignment frameworkframework0.714
The concept of inner vs outer alignment, referenced multiple times.
Algorithm 1: Finding Localist Alignment Matrixmethod0.711
Algorithm that extracts a localist (axis-aligned) approximation from any learned orthogonal rotation matrix for baseline comparison.
Alignment Functionconcept0.703
A learnable invertible transformation in DAS that maps neural representations to a basis aligned with causal variables