finding
active
finding:llms-trained-only-on-language-data-have-rich-enough-knowledge-of-visual-structures-that-decent-visual-representations-can-be-trained-on-images-generated-solely-by-querying-the-llm

LLMs trained only on language data have rich enough knowledge of visual structures that decent visual representations can be trained on images generated solely by querying the LLM

Sharma et al. result supporting cross-modal alignment: language-only models implicitly encode visual structure

Source paper

extracted_from
The Platonic Representation Hypothesis
(2024) · Minyoung Huh · Brian Cheung · Tongzhou Wang · Phillip Isola

Neighborhood — ranked by edge-count

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Restated by (1)

cosine ≥ 0.90

Other entities that say roughly the same thing. May be merge candidates or independent restatements across papers.