hypothesis

active

hypothesis:a-family-of-contrastive-learners-converges-to-a-representation-whose-kernel-is-the-pointwise-mutual-information-pmi-of-the-underlying-events

A family of contrastive learners converges to a representation whose kernel is the pointwise mutual information (PMI) of the underlying events

Mathematical formalization of what representation models converge to

Source paper

extracted_from

The Platonic Representation Hypothesis

(2024) · Minyoung Huh · Brian Cheung · Tongzhou Wang · Phillip Isola

Neighborhood — ranked by edge-count

Concepts (2)

concept

Pointwise Mutual Information Kernel
associated_with
The kernel that contrastive learners converge to; similarity equals PMI between observations
Bijective Observation Function
supports
The idealized assumption that observations are bijective mappings of events, required for cross-modal convergence proof

Frameworks (1)

framework

Contrastive learning
implements
Supervised learning framework where system learns by observing contrast between current response and nudged improved response; requires weak additional forces from supervisor

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Different models cannot converge to the same representation if they have access to fundamentally different information; convergence is capped by mutual information between input signalsclaim0.787
Key limitation of the PRH for non-bijective observations
Certain representation learning algorithms boil down to a simple rule: find an embedding in which similarity equals PMIclaim0.772
Core theoretical claim about the target of representation learning
Independently trained model families converge on a common semantic manifold under self-referential processing, suggesting an attractor dynamic that transcends training variancehypothesis0.771
Hypothesis tested in Experiment 3; independently trained GPT, Claude, Gemini architectures converge on similar descriptive vocabulary
How do representations differ or converge between architectures, tasks, and modalities?question0.762
Broader research question MAS is positioned to address, citing multiple recent works.
Foundation models trained on different data converge on similar latent representations, suggesting a Platonic form.claim0.761
PMI computed from color cooccurrences in CIFAR-10 images yields a perceptual color representation closely matching both CIELAB space and language model embeddings (SimCSE, RoBERTa)finding0.761
Validates theoretical PMI convergence claim on real data
If there is a modality-agnostic platonic representation, training on both image and language data should improve the best model in either modalityclaim0.760
Implication of PRH for training practice: both modalities point at the same underlying reality
Levin's Platonic-space program explains why cross-modality convergence occurs; Alexander's centers framework explains how to recognize foundation-ness beyond task accuracy.claim0.756