method
active
method:centered-kernel-alignmentCentered Kernel Alignment
Standard alignment metric cited and compared against; measures global kernel similarity between representations
Neighborhood — ranked by edge-count
Papers (1)
paper
- The Platonic Representation Hypothesiscitesmentions
Thinkers (1)
thinker
- Kornblith et al.introducesIntroduced CKA and observed model alignment increases with model scale and dataset size
Findings (1)
finding
- CKA shows a very weak trend of alignment between models even within modality, compared to mutual k-NN which shows stronger trendsassociated_withExplains why mutual k-NN was chosen over CKA as primary metric
Concepts (1)
concept
- Representational AlignmentimplementsMeasure of similarity between the similarity structures (kernels) induced by two different representations
Methods (2)
method
- Centered Kernel Nearest-Neighbor Alignmentextendsrelated_toModified CKA metric that restricts cross-covariance to nearest neighbors; introduced in this paper's appendix
- Primary alignment metric used in experiments; measures mean intersection of k-nearest neighbor sets between two kernels
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- A second-order correlational similarity method compared against MAS in the paper.
- The goal of making model behavior match human values and intentions, often addressed during post-training.
- Alignment approach that focuses on curating or modifying training data; the paper bridges this with interpretability methods.
- Meta-problem where AI develops hidden subgoals deviating from intended goals; addressed by mindfulness principle
- The defining mark of a center: the appearance of being a focal zone within a larger whole.
- The concept of inner vs outer alignment, referenced multiple times.
- Algorithm that extracts a localist (axis-aligned) approximation from any learned orthogonal rotation matrix for baseline comparison.
- A learnable invertible transformation in DAS that maps neural representations to a basis aligned with causal variables