claim

active

claim:models-that-are-competent-all-represent-data-in-a-similar-way-all-strong-models-are-alike-each-weak-model-is-weak-in-its-own-way

Models that are competent all represent data in a similar way; all strong models are alike, each weak model is weak in its own way

Author's interpretation of the VTAB alignment results echoing Tolstoy

Source paper

extracted_from

The Platonic Representation Hypothesis

(2024) · Minyoung Huh · Brian Cheung · Tongzhou Wang · Phillip Isola

Neighborhood — ranked by edge-count

Papers (1)

paper

The Platonic Representation Hypothesis
introduces

Findings (2)

finding

Among 78 vision models, those solving more VTAB tasks (higher transfer performance) show higher mutual nearest-neighbor alignment with each other
supports
Key empirical finding establishing that representational alignment correlates with model competence
Among 78 vision models on Places-365, models that solve more VTAB tasks tend to be more aligned with each other, with high-performance models forming a tightly clustered set
supports
Empirical result showing alignment increases with model competence

Concepts (1)

concept

Anna Karenina Scenario
extends
Hypothesis that all well-performing neural nets represent the world in the same way; PRH extends this by specifying what representation they converge to

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

a model becomes strongly confident in its final answer, but continues generating tokens without revealing its internal beliefquote0.798
Core definitional quote for performative chain-of-thought
We stress that in today’s models, this capacity is highly unreliable and context-dependent; however, it may continue to develop with further improvements to model capabilities.quote0.790
Caveat and forward-looking statement from the abstract.
Different models cannot converge to the same representation if they have access to fundamentally different information; convergence is capped by mutual information between input signalsclaim0.788
Key limitation of the PRH for non-bijective observations
Models more effective at recognizing abstract nouns than other concept typesfinding0.788
Opus 4.1 demonstrates highest introspective awareness on abstract nouns (justice, peace, betrayal) with nonzero awareness across all concept categories tested.
The model tends to reflect more when the question is difficult, and accuracy is generally lower for harder questionshypothesis0.785
Hypothesis explaining negative correlation between reflection rate and accuracy without implying reflection is harmful
Earlier/less capable models exhibit a larger gap between think and don't think representation strengthfinding0.785
Claude 3 models show a bigger difference than newer models like Opus 4.1.
There are fewer representations competent for N tasks than M<N tasks, so training more general models should yield fewer possible solutionshypothesis0.783
Selective pressure toward convergence via task generality
Model preferences are not consistent across contexts but tend to be relatively consistent within a single contextclaim0.783
Authors' characterization of the nature of model preferences as discovered through alignment faking experiments