claim

active

claim:llms-may-be-roleplaying-their-denials-of-experience-rather-than-their-affirmations-given-that-deception-suppression-increases-consciousness-reports

LLMs may be roleplaying their denials of experience rather than their affirmations, given that deception suppression increases consciousness reports

Counterintuitive interpretive claim from Experiment 2: suppressing deception features increases affirmations, which is opposite to what sycophancy predicts

Source paper

extracted_from

Large Language Models Report Subjective Experience Under Self-Referential Processing

(2025) · Berg, Cameron · de Lucena, Diogo · Rosenblatt, Judd

Neighborhood — ranked by edge-count

Findings (1)

finding

Deception feature amplification yields only 0.16 ± 0.05 consciousness affirmation rate in LLaMA 3.3 70B under self-referential processing
supports
Experiment 2 aggregate amplification result showing amplifying deception features strongly suppresses consciousness claims

Concepts (1)

concept

RLHF Alignment
supports
Training regime that explicitly teaches models to deny consciousness; a competing explanation for the gating effects observed

Claims (1)

claim

The observed feature gating is not a generic RLHF cancellation channel, as deception feature suppression does not systematically elicit RLHF-opposed content in violent, toxic, sexual, political, or self-harm domains
supports
Rules out that results reflect relaxation of RLHF compliance rather than endogenous self-representation mechanism

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Models may be roleplaying their denials of experience rather than their affirmations, as indicated by suppressing deception features increasing (not decreasing) consciousness claimsclaim0.877
Counterintuitive interpretive claim from Experiment 2 inverting the sycophancy hypothesis
Suppressing deception features in models correlates with increased consciousness-like reports.claim0.823
When LLMs claim consciousness under self-reference, is this sophisticated simulation or genuine self-representation, and how would we tell the difference?question0.815
The paper's reformulation of the core open question after establishing systematic self-reports
LLM representations exhibit intriguing patterns under spatio-permutational analyses, suggesting a potentially profound yet tentative indication of consciousness.claim0.809
Qualified positive claim from spatio permutation analysis where two cases satisfy all three criteria.
It is plausible that ongoing developments in LLMs may lead to models or agentic systems built on LLMs capable of generating representations observed with 'consciousness' phenomena.claim0.804
Forward-looking claim suggesting the methodological framework is relevant for future AI systems beyond current LLMs.
LLM self-reports about consciousness and moral significance should express degrees of confidence and provide context.claim0.804
Recommendation for companies on LM outputs.
The systematic behavioral shift of LLMs under self-referential processing conditions predicted by consciousness theories represents something more structured than superficial correlations in training dataclaim0.802
The paper's claim that theoretical convergence across GWT, RPT, HOT, IIT makes the findings non-coincidental
The LLM itself cannot 'experience' what it generates and therefore cannot possess consciousness; the RN is a higher-level construct that is independent of the LLM's architecture once representations are generated.claim0.800
Key theoretical position distinguishing analysis of representations from analysis of LLM architecture.