hypothesis
active
hypothesis:a-family-of-contrastive-learners-converges-to-a-representation-whose-kernel-is-the-pointwise-mutual-information-pmi-of-the-underlying-eventsA family of contrastive learners converges to a representation whose kernel is the pointwise mutual information (PMI) of the underlying events
Mathematical formalization of what representation models converge to
Source paper
extracted_from(2024) · Minyoung Huh · Brian Cheung · Tongzhou Wang · Phillip Isola
Neighborhood — ranked by edge-count
Concepts (2)
concept
- Pointwise Mutual Information Kernelassociated_withThe kernel that contrastive learners converge to; similarity equals PMI between observations
- Bijective Observation FunctionsupportsThe idealized assumption that observations are bijective mappings of events, required for cross-modal convergence proof
Frameworks (1)
framework
- Contrastive learningimplementsSupervised learning framework where system learns by observing contrast between current response and nudged improved response; requires weak additional forces from supervisor
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Key limitation of the PRH for non-bijective observations
- Core theoretical claim about the target of representation learning
- Hypothesis tested in Experiment 3; independently trained GPT, Claude, Gemini architectures converge on similar descriptive vocabulary
- How do representations differ or converge between architectures, tasks, and modalities?question0.762Broader research question MAS is positioned to address, citing multiple recent works.
- Validates theoretical PMI convergence claim on real data
- Implication of PRH for training practice: both modalities point at the same underlying reality