hypothesis
active
hypothesis:higher-information-denser-captions-should-yield-higher-language-vision-alignment-scoresHigher information (denser) captions should yield higher language-vision alignment scores
Tests the information-level cap on cross-modal alignment
Source paper
extracted_from(2024) · Minyoung Huh · Brian Cheung · Tongzhou Wang · Phillip Isola
Neighborhood — ranked by edge-count
Findings (1)
finding
- Tests information-level cap on cross-modal alignment
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Supports the claim that information content of modality pairing determines alignment level
- Preliminary test of the information-level limitation of PRH; denser captions = higher cross-modal alignment
- Claims that alignment score is a proxy for general capability
- CLIP training paradigm finding in cross-modal alignment
- Core cross-modal empirical result: larger and better language models align better with vision models
- Quantitative bound on observed alignment; raises the open question of whether this gap reflects noise or real misalignment
- Shows interpretability correlates with activation strength, most model effect comes from high activations
- Glaese et al. 2022: Improving alignment of dialogue agents via targeted human judgementsconcept0.750Alignment paper cited as example of RLHF fine-tuning technique; ref 19