finding
active
finding:a-single-linear-projection-is-sufficient-to-stitch-a-vision-model-to-an-llm-and-achieve-good-performance-on-visual-question-answering-and-image-captioning

A single linear projection is sufficient to stitch a vision model to an LLM and achieve good performance on visual question answering and image captioning

Merullo et al. result on cross-modal representational compatibility

Source paper

extracted_from
The Platonic Representation Hypothesis
(2024) · Minyoung Huh · Brian Cheung · Tongzhou Wang · Phillip Isola

Neighborhood — ranked by edge-count

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.