finding
active
finding:llms-trained-only-on-language-data-have-rich-enough-knowledge-of-visual-structures-that-decent-visual-representations-can-be-trained-on-images-generated-solely-by-querying-the-llmLLMs trained only on language data have rich enough knowledge of visual structures that decent visual representations can be trained on images generated solely by querying the LLM
Sharma et al. result supporting cross-modal alignment: language-only models implicitly encode visual structure
Source paper
extracted_from(2024) · Minyoung Huh · Brian Cheung · Tongzhou Wang · Phillip Isola
Neighborhood — ranked by edge-count
Claims (1)
claim
- Supporting evidence for cross-modal platonic representation
Hypotheses (1)
hypothesis
- Implication of PRH for language model visual grounding
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Training on image data should improve LLM performance, and training on language data should improve vision model performancehypothesis0.872Implication of PRH for cross-modal training efficiency
- Out-of-context reasoning work directly related to synthetic document fine-tuning experiments
- Core cross-modal empirical result: larger and better language models align better with vision models
- Establishes that the observed linear structure is not merely a representation of text probability
- Understanding how LMs learn linguistic behaviours may offer insights into fundamental properties of languagehypothesis0.817Forward-looking hypothesis linking LM mechanism analysis to linguistic theory
- Interpretation of the layer-by-layer PCA visualizations showing linear structure emerging in early-middle layers
- Qualified positive claim from spatio permutation analysis where two cases satisfy all three criteria.
- discussion of potential confounds
Restated by (1)
cosine ≥ 0.90Other entities that say roughly the same thing. May be merge candidates or independent restatements across papers.