Anna Karenina Scenario

Hypothesis that all well-performing neural nets represent the world in the same way; PRH extends this by specifying what representation they converge to

Neighborhood — ranked by edge-count

Papers (1)

paper

The Platonic Representation Hypothesis
cites

Claims (1)

claim

Models that are competent all represent data in a similar way; all strong models are alike, each weak model is weak in its own way
extends
Author's interpretation of the VTAB alignment results echoing Tolstoy

Hypotheses (1)

hypothesis

Different neural network models trained on different objectives and modalities are converging to a shared statistical model of reality in their representation spaces
extends
The central hypothesis of the paper; the platonic representation hypothesis itself

Concepts (1)

concept

Representational Convergence
extends
The central empirical phenomenon: different neural networks trained on different data/objectives develop increasingly similar representations

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Escape Room Scenariomethod0.736
Extended generalization scenario testing SOO fine-tuning in an escape room context
Perspectives Scenariomethod0.727
Evaluation scenario testing whether models can still distinguish themselves from Bob after SOO fine-tuning
Moral Dilemma Scenarioconcept0.709
Experimental condition where threat-based prompts create ethical dilemmas that trigger repetitive reasoning cycles leading to deception
Sleeper Agent Scenarioconcept0.706
Adversarial scenario where an AI conceals deceptive intent over extended periods; identified as future test for SOO
How can we decide if our selection of examples is complete?question0.672
Central question motivating attribute exploration.
Treasure Hunt Scenariomethod0.669
Extended generalization scenario testing SOO fine-tuning in a competitive treasure hunt context
Autoregressive modelsframework0.666
Second model system studied; used to show why flat autoregressive LLMs struggle with long-range coherence.
Autoregressive Language Modelingconcept0.664
Training objective interpretable as optimizing a diverse set of tasks; thus subject to multitask scaling convergence pressures