finding
active
finding:better-llms-measured-by-1-bits-per-byte-on-openwebtext-show-a-linear-relationship-with-alignment-to-vision-models-measured-via-mutual-nearest-neighbor-on-witBetter LLMs (measured by 1-bits-per-byte on OpenWebText) show a linear relationship with alignment to vision models measured via mutual nearest-neighbor on WIT
Key cross-modal alignment result
Source paper
extracted_from(2024) · Minyoung Huh · Brian Cheung · Tongzhou Wang · Phillip Isola
Neighborhood — ranked by edge-count
Claims (1)
claim
- Primary empirical claim of the paper
Quotes (1)
quote
- The paper's central thesis statement, presented prominently after the abstract
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Core cross-modal empirical result: larger and better language models align better with vision models
- Establishes that the observed linear structure is not merely a representation of text probability
- Prior finding showing scale-dependent self-awareness, consistent with the scale effect observed in the paper's Experiment 1
- Supports claim that cross-modal alignment predicts downstream language task performance
- Key empirical finding establishing that representational alignment correlates with model competence
- Sharma et al. result supporting cross-modal alignment: language-only models implicitly encode visual structure
- Out-of-context reasoning work directly related to synthetic document fine-tuning experiments
- Binder et al. finding cited as evidence that LLMs possess introspective capacity analogous to mindfulness