finding
active
finding:llm-alignment-to-dinov2-vision-model-shows-a-linear-relationship-with-hellaswag-commonsense-reasoning-performanceLLM alignment to DINOv2 vision model shows a linear relationship with HellaSwag (commonsense reasoning) performance
Supports claim that cross-modal alignment predicts downstream language task performance
Source paper
extracted_from(2024) · Minyoung Huh · Brian Cheung · Tongzhou Wang · Phillip Isola
Neighborhood — ranked by edge-count
Claims (1)
claim
- Claims that alignment score is a proxy for general capability
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- LLM alignment score to DINOv2 shows an emergence-esque trend with GSM8K mathematical reasoning performancefinding0.842Alignment predicts math performance with emergent pattern
- Core cross-modal empirical result: larger and better language models align better with vision models
- Key cross-modal alignment result
- Establishes that the observed linear structure is not merely a representation of text probability
- Merullo et al. result on cross-modal representational compatibility
- Ngo & Kim result extending cross-modal convergence to the auditory domain
- Theoretical interpretation of antipodal alignment and misalignment phenomena in PCA visualizations
- Demonstrates that high IIA can be obtained even when model cannot solve the task