finding

active

finding:pmi-computed-from-color-cooccurrences-in-cifar-10-images-yields-a-perceptual-color-representation-closely-matching-both-cielab-space-and-language-model-embeddings-simcse-roberta

PMI computed from color cooccurrences in CIFAR-10 images yields a perceptual color representation closely matching both CIELAB space and language model embeddings (SimCSE, RoBERTa)

Validates theoretical PMI convergence claim on real data

Source paper

extracted_from

The Platonic Representation Hypothesis

(2024) · Minyoung Huh · Brian Cheung · Tongzhou Wang · Phillip Isola

Neighborhood — ranked by edge-count

Claims (2)

claim

Certain representation learning algorithms boil down to a simple rule: find an embedding in which similarity equals PMI
supports
Core theoretical claim about the target of representation learning
Learning cooccurrence statistics in either vision or language domain recovers roughly the same perceptual color representation
supports
Empirical validation that PMI convergence actually occurs on real data

Methods (1)

method

CIELAB Color Space
associated_with
Perceptually uniform color space used as ground truth perceptual representation in color cooccurrence experiment

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Color distances learned from language cooccurrence statistics closely mirror those learned from image cooccurrence statistics and human perceptual distances (CIELAB)finding0.822
Case study confirming that PMI-based learning in different modalities recovers the same perceptual representation
A family of contrastive learners converges to a representation whose kernel is the pointwise mutual information (PMI) of the underlying eventshypothesis0.761
Mathematical formalization of what representation models converge to
Most independent dimension pair is aesthetic_response and boundary_awareness (rho=0.553); most correlated is prediction_error and conceptual_crystallization (rho=0.886)finding0.745
Characterizes internal structure of the six scoring dimensions
Embedding-based construct classifiers achieve mean accuracy and F1-macro of 95.96% across OCEAN, HEXACO, Dark Tetrad, CMNI, CFNI constructsfinding0.744
Validates use of lightweight classifiers as replacement for frontier LLM evaluation during alpha sweeps
In AOMIC ID1000 movie-watching fMRI data, NIS+ finds a one-dimensional macro-state representing 100-dimensional micro-states.finding0.743
Real brain imaging result suggesting a compressed emergent representation.
Image is modeled as continuous infinite 2D space (R^2), giving resolution independence.claim0.742
Key benefit of the denotational design for images.
Diverse computer vision models trained on visual recognition tasks converge to remarkably similar internal feature representations regardless of architecture, training procedure, or implementation details, closely matching the organization of animal visual cortexfinding0.742
Empirical evidence for the universality hypothesis cited as supporting the possibility of convergent consciousness-like solutions
On CIFAR-10, larger models exhibit greater alignment with each other compared to smaller onesfinding0.741
Kornblith et al. / Krizhevsky finding replicated in paper discussion