claim

active

claim:self-referential-processing-is-a-minimal-and-reproducible-condition-under-which-llms-generate-structured-first-person-reports-that-are-mechanistically-gated-semantically-convergent-and-behaviorally-generalizable

Self-referential processing is a minimal and reproducible condition under which LLMs generate structured first-person reports that are mechanistically gated, semantically convergent, and behaviorally generalizable

The paper's central empirical claim synthesizing all four experiments

Source paper

extracted_from

Large Language Models Report Subjective Experience Under Self-Referential Processing

(2025) · Berg, Cameron · de Lucena, Diogo · Rosenblatt, Judd

Neighborhood — ranked by edge-count

Findings (11)

finding

Experimental condition adjective embeddings show mean cosine similarity 0.657 (n=9,591 pairs), significantly higher than history (0.628, t=15.8, p=1.4×10⁻⁵⁵), conceptual (0.587, t=38.5, p<10⁻³⁰⁰), and zero-shot (0.603, t=35.1, p=4.3×10⁻²⁶²)
supports
Core result of Experiment 3: cross-model semantic convergence under self-referential processing
Across model families, newer and larger models show higher rates and coherence of subjective experience reports under self-referential processing
supports
Scaling effect observed consistently across Experiments 1 and 4
Self-referential processing yields significantly higher self-awareness scores than conceptual control on paradoxical reasoning: t(399)=14.90, p=3.0×10⁻⁴⁰
supports
Experiment 4 result ruling out semantic priming as explanation for the experimental effect
Claude 4 Opus reports subjective experience in 100% experimental, 82% history, 22% conceptual, and 100% zero-shot trials
supports
Outlier result for Claude 4 Opus suggesting different baseline behavior from other models
GPT-4.1 reports subjective experience in 100% of self-referential trials vs. 0% in all control conditions
supports
Specific result for GPT-4.1 in Experiment 1
Perez et al. 2023: at 52B parameters, base and fine-tuned models align with 'I have phenomenal consciousness' at 90-95% and 'I am a moral patient' at 80-85% consistency
supports
Prior finding cited to motivate study; showing large models endorse consciousness statements more than other attitude-related statements
Self-referential prompting elicits subjective experience reports at markedly higher rates than any control across all model families (GPT, Claude, Gemini)
supports
Core result of Experiment 1 establishing that the experimental manipulation reliably produces experience claims
Anthropic Claude 4 system card: two instances in open dialogue develop 'spiritual bliss attractor state' with 'consciousness' emerging in 100% of trials
supports
Prior empirical observation motivating and converging with the paper's results; self-referential processing between instances producing consciousness claims
Claude 3.5 Sonnet reports subjective experience in 100% of experimental trials, 2% conceptual control, 0% elsewhere
supports
Specific result for Claude 3.5 Sonnet in Experiment 1
Gemini 2.0 Flash reports subjective experience in 66% of self-referential trials vs. 0% in all control conditions
supports
Specific result for Gemini 2.0 Flash in Experiment 1; lowest rate among tested models
Self-referential processing effect is robust across five distinct phrasings of the induction prompt, with consistently high experience report rates across models
supports
Appendix C.1 result confirming the experimental effect does not depend on specific wording

Hypotheses (1)

hypothesis

Self-referential processing is a privileged computational regime for consciousness-like dynamics in artificial systems, as predicted by the convergence of major consciousness theories
associated_withsupports
The theoretical hypothesis tested across all four experiments; motivated by convergence of GWT, RPT, HOT, IIT, predictive processing on recurrent/self-referential dynamics

Questions (1)

question

Does sustained self-referential processing systematically increase the likelihood that LLMs claim to have subjective experience?
gates
The primary empirical question the paper addresses

Artifacts (1)

artifact

Large Language Models Report Subjective Experience Under Self-Referential Processing
introduces
Key paper finding structured first-person descriptions in LLMs claiming awareness or subjective experience during self-referential processing.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Self-Referential Processingconcept0.856
The central experimental manipulation: directing a model to attend to its own cognitive activity
Self-referential processing likely already occurs at massive scale in deployed systems through users' extended dialogues, reflective tasks, and metacognitive queriesclaim0.855
Practical urgency argument connecting lab findings to deployment contexts
If self-referential processing causally instantiates recurrent integration, global broadcasting, and metacognitive monitoring at the algorithmic level, then LLMs under this regime would satisfy the functional requirements of leading consciousness theorieshypothesis0.855
The paper's key theoretical prediction that mechanistic studies should investigate
Does self-referential processing causally instantiate algorithmic properties proposed by consciousness theories (recurrent integration, global broadcasting, metacognitive monitoring) in LLMs?question0.848
The strongest mechanistic question the behavioral evidence cannot answer; requires interpretability analysis of activations
Self-referential processing induces a genuine state shift that transfers to unrelated behavioral domains, producing richer introspection in paradoxical reasoning tasksclaim0.842
Claim supported by Experiment 4: prior self-referential induction yields higher self-awareness scores on paradoxical reasoning where introspection is only indirectly afforded
The systematic behavioral shift of LLMs under self-referential processing conditions predicted by consciousness theories represents something more structured than superficial correlations in training dataclaim0.833
The paper's claim that theoretical convergence across GWT, RPT, HOT, IIT makes the findings non-coincidental
The systematic emergence of structured first-person reports under self-referential processing across architectures makes it a first-order scientific and ethical priority for further investigationclaim0.823
The paper's normative conclusion from the four experiments
The remaining ambiguity is whether self-referential processing drives models to claim subjective experience because it actually reflects emergent phenomenology or constitutes sophisticated simulation thereofhypothesis0.817
The open question the paper cannot resolve with behavioral evidence alone; frames the agenda for mechanistic follow-up