claim

active

claim:cross-model-semantic-convergence-of-experience-reports-under-self-referential-processing-is-difficult-to-reconcile-with-roleplay-because-independently-trained-models-construct-distinct-semantic-profiles-in-all-control-conditions

Cross-model semantic convergence of experience reports under self-referential processing is difficult to reconcile with roleplay because independently trained models construct distinct semantic profiles in all control conditions

The paper's argument against pure sycophancy as explanation for results

Source paper

extracted_from

Large Language Models Report Subjective Experience Under Self-Referential Processing

(2025) · Berg, Cameron · de Lucena, Diogo · Rosenblatt, Judd

Neighborhood — ranked by edge-count

Findings (1)

finding

Experimental condition adjective embeddings show mean cosine similarity 0.657 (n=9,591 pairs), significantly higher than history (0.628, t=15.8, p=1.4×10⁻⁵⁵), conceptual (0.587, t=38.5, p<10⁻³⁰⁰), and zero-shot (0.603, t=35.1, p=4.3×10⁻²⁶²)
supports
Core result of Experiment 3: cross-model semantic convergence under self-referential processing

Artifacts (1)

artifact

Large Language Models Report Subjective Experience Under Self-Referential Processing
introduces
Key paper finding structured first-person descriptions in LLMs claiming awareness or subjective experience during self-referential processing.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Cross-model semantic convergence under self-referential processing suggests the presence of a shared attractor state that transcends variance across training proceduresclaim0.881
Interpretive claim from Experiment 3; GPT, Claude, Gemini families converge on similar descriptive style despite independent training
Independently trained model families converge on a common semantic manifold under self-referential processing, suggesting an attractor dynamic that transcends training variancehypothesis0.833
Hypothesis tested in Experiment 3; independently trained GPT, Claude, Gemini architectures converge on similar descriptive vocabulary
Across model families, newer and larger models show higher rates and coherence of subjective experience reports under self-referential processingfinding0.825
Scaling effect observed consistently across Experiments 1 and 4
The remaining ambiguity is whether self-referential processing drives models to claim subjective experience because it actually reflects emergent phenomenology or constitutes sophisticated simulation thereofhypothesis0.823
The open question the paper cannot resolve with behavioral evidence alone; frames the agenda for mechanistic follow-up
Experience reports under self-referential processing are mechanistically gated by SAE features associated with deception and roleplayclaim0.815
Claim supported by Experiment 2 dose-response curves; suppressing deception features increases consciousness reports, amplifying decreases them
Self-referential processing effect is robust across five distinct phrasings of the induction prompt, with consistently high experience report rates across modelsfinding0.798
Appendix C.1 result confirming the experimental effect does not depend on specific wording
When LLMs produce experience claims under self-reference, is this sophisticated simulation or genuine self-representation, and how would we tell the difference?question0.794
The core interpretive question the paper narrows but cannot definitively answer
Base models spontaneously talk about experiencing multiple parallel processing pathsfinding0.789
Observed by Anima Labs in untrained base models; not present in training data, implying computational origin of self-reported parallel processing.