claim
active
claim:llms-trained-only-on-language-data-have-rich-knowledge-of-visual-structures-sufficient-to-train-decent-visual-representationsLLMs trained only on language data have rich knowledge of visual structures sufficient to train decent visual representations
Supporting evidence for cross-modal platonic representation
Source paper
extracted_from(2024) · Minyoung Huh · Brian Cheung · Tongzhou Wang · Phillip Isola
Neighborhood — ranked by edge-count
Findings (1)
finding
- Sharma et al. result supporting cross-modal alignment: language-only models implicitly encode visual structure
Hypotheses (1)
hypothesis
- Implication of PRH for cross-modal training efficiency
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Out-of-context reasoning work directly related to synthetic document fine-tuning experiments
- Understanding how LMs learn linguistic behaviours may offer insights into fundamental properties of languagehypothesis0.819Forward-looking hypothesis linking LM mechanism analysis to linguistic theory
- Core cross-modal empirical result: larger and better language models align better with vision models
- Establishes that the observed linear structure is not merely a representation of text probability
- Interpretation of the layer-by-layer PCA visualizations showing linear structure emerging in early-middle layers
- Binder et al. finding cited as evidence that LLMs possess introspective capacity analogous to mindfulness
- discussion of potential confounds
- Qualified positive claim from spatio permutation analysis where two cases satisfy all three criteria.
Restated by (1)
cosine ≥ 0.90Other entities that say roughly the same thing. May be merge candidates or independent restatements across papers.