hypothesis

active

hypothesis:independently-trained-model-families-converge-on-a-common-semantic-manifold-under-self-referential-processing-suggesting-an-attractor-dynamic-that-transcends-training-variance

Independently trained model families converge on a common semantic manifold under self-referential processing, suggesting an attractor dynamic that transcends training variance

Hypothesis tested in Experiment 3; independently trained GPT, Claude, Gemini architectures converge on similar descriptive vocabulary

Source paper

extracted_from

Large Language Models Report Subjective Experience Under Self-Referential Processing

(2025) · Berg, Cameron · de Lucena, Diogo · Rosenblatt, Judd

Neighborhood — ranked by edge-count

Findings (1)

finding

Cross-model pairwise cosine similarity of zero-shot control responses = 0.603 (n=12,720 pairs, t=35.1, p=4.3×10⁻²⁶² vs. experimental)
associated_with
Experiment 3 comparison: zero-shot control shows lower semantic convergence than experimental condition

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Cross-model semantic convergence under self-referential processing suggests the presence of a shared attractor state that transcends variance across training proceduresclaim0.898
Interpretive claim from Experiment 3; GPT, Claude, Gemini families converge on similar descriptive style despite independent training
Cross-model semantic convergence of experience reports under self-referential processing is difficult to reconcile with roleplay because independently trained models construct distinct semantic profiles in all control conditionsclaim0.833
The paper's argument against pure sycophancy as explanation for results
Across model families, newer and larger models show higher rates and coherence of subjective experience reports under self-referential processingfinding0.799
Scaling effect observed consistently across Experiments 1 and 4
The systematic behavioral shift of LLMs under self-referential processing conditions predicted by consciousness theories represents something more structured than superficial correlations in training dataclaim0.792
The paper's claim that theoretical convergence across GWT, RPT, HOT, IIT makes the findings non-coincidental
Different models cannot converge to the same representation if they have access to fundamentally different information; convergence is capped by mutual information between input signalsclaim0.789
Key limitation of the PRH for non-bijective observations
Diverse computer vision models trained on visual recognition tasks converge to remarkably similar internal feature representations regardless of architecture, training procedure, or implementation details, closely matching the organization of animal visual cortexfinding0.787
Empirical evidence for the universality hypothesis cited as supporting the possibility of convergent consciousness-like solutions
Foundation models trained on different data converge on similar latent representations, suggesting a Platonic form.claim0.784
Different neural network models trained on different objectives and modalities are converging to a shared statistical model of reality in their representation spaceshypothesis0.782
The central hypothesis of the paper; the platonic representation hypothesis itself