finding
active
finding:among-78-vision-models-those-solving-more-vtab-tasks-higher-transfer-performance-show-higher-mutual-nearest-neighbor-alignment-with-each-otherAmong 78 vision models, those solving more VTAB tasks (higher transfer performance) show higher mutual nearest-neighbor alignment with each other
Key empirical finding establishing that representational alignment correlates with model competence
Source paper
extracted_from(2024) · Minyoung Huh · Brian Cheung · Tongzhou Wang · Phillip Isola
Neighborhood — ranked by edge-count
Claims (1)
claim
- Author's interpretation of the VTAB alignment results echoing Tolstoy
Hypotheses (3)
hypothesis
- Different neural network models trained on different objectives and modalities are converging to a shared statistical model of reality in their representation spacesassociated_withsupportsThe central hypothesis of the paper; the platonic representation hypothesis itself
- Multitask Scaling HypothesissupportsArgues that there are fewer representations competent for N tasks than M<N tasks, so more general models have a smaller solution space
- Capacity HypothesissupportsBigger models are more likely to converge to a shared representation than smaller models because they can better approximate the global optimum
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Empirical result showing alignment increases with model competence
- Key cross-modal alignment result
- Claims that alignment score is a proxy for general capability
- Empirical evidence for the universality hypothesis cited as supporting the possibility of convergent consciousness-like solutions
- Empirical finding supporting the Universality Hypothesis; extended by the paper to consciousness
- Lenc & Vedaldi result illustrating data independence in representations and layer-wise alignment
- Demonstrated CNN representations predict neurons in visual cortex; background motivation for neural-network-brain correspondence.
- Core cross-modal empirical result: larger and better language models align better with vision models