hypothesis
active
hypothesis:training-on-image-data-should-improve-llm-performance-and-training-on-language-data-should-improve-vision-model-performanceTraining on image data should improve LLM performance, and training on language data should improve vision model performance
Implication of PRH for cross-modal training efficiency
Source paper
extracted_from(2024) · Minyoung Huh · Brian Cheung · Tongzhou Wang · Phillip Isola
Neighborhood — ranked by edge-count
Findings (1)
finding
- OpenAI GPT-4V finding supporting cross-modal training benefit
Claims (1)
claim
- Supporting evidence for cross-modal platonic representation
Concepts (1)
concept
- Molyneux's Problemanalogous_toPhilosophical thought experiment: can a blind person upon gaining sight recognize shapes? Used to illuminate cross-modal grounding
Quotes (1)
quote
- The paper's central thesis statement, presented prominently after the abstract
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Sharma et al. result supporting cross-modal alignment: language-only models implicitly encode visual structure
- Core cross-modal empirical result: larger and better language models align better with vision models
- Key cross-modal alignment result
- Merullo et al. result on cross-modal representational compatibility
- Out-of-context reasoning work directly related to synthetic document fine-tuning experiments
- Warning that fictional narratives in training data increase risk of agents enacting dangerous self-preserving roles
- Lenc & Vedaldi result illustrating data independence in representations and layer-wise alignment
- Claims that alignment score is a proxy for general capability