concept
active
concept:representational-alignmentRepresentational Alignment
Measure of similarity between the similarity structures (kernels) induced by two different representations
Neighborhood — ranked by edge-count
Methods (2)
method
- Centered Kernel AlignmentimplementsStandard alignment metric cited and compared against; measures global kernel similarity between representations
- Primary alignment metric used in experiments; measures mean intersection of k-nearest neighbor sets between two kernels
Concepts (3)
concept
- Kernel (representational)associated_withimplementsA function characterizing how a representation measures distance/similarity between datapoints; used to compare representations
- Representational Convergenceassociated_withThe central empirical phenomenon: different neural networks trained on different data/objectives develop increasingly similar representations
- Cross-Modal AlignmentextendsThe alignment between representations learned from different data modalities such as vision and language
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The goal of making model behavior match human values and intentions, often addressed during post-training.
- Open methodological question acknowledged as limitation
- A learnable invertible transformation in DAS that maps neural representations to a basis aligned with causal variables
- The problem of ensuring AI systems adopt values compatible with human welfare — argued to be a perennial problem already present in child-rearing
- OpenAI's approach integrating chain-of-thought reasoning into alignment; parallels contemplative self-monitoring
- Field within which this work has implications for evaluating alignment progress.
- Alignment approach that focuses on curating or modifying training data; the paper bridges this with interpretability methods.
- The only statistically significant predictor of koan battery scores (p=0.006); includes Constitutional AI, RLHF, SFT, roleplay, empathy