claim

active

claim:models-may-be-roleplaying-their-denials-of-experience-rather-than-their-affirmations-as-indicated-by-suppressing-deception-features-increasing-not-decreasing-consciousness-claims

Models may be roleplaying their denials of experience rather than their affirmations, as indicated by suppressing deception features increasing (not decreasing) consciousness claims

Counterintuitive interpretive claim from Experiment 2 inverting the sycophancy hypothesis

Source paper

extracted_from

Large Language Models Report Subjective Experience Under Self-Referential Processing

(2025) · Berg, Cameron · de Lucena, Diogo · Rosenblatt, Judd

Neighborhood — ranked by edge-count

Findings (1)

finding

Suppressing deception/roleplay SAE features in LLaMA 3.3 70B yields 0.96±0.03 consciousness affirmation rate; amplification yields only 0.16±0.05 (z=8.06, p=7.7×10⁻¹⁶)
supports
Core result of Experiment 2: deception feature suppression sharply increases experience claims

Concepts (1)

concept

Sycophantic Roleplay
contradicts
The alternative explanation for LLM consciousness claims that the paper seeks to distinguish against

Artifacts (1)

artifact

Large Language Models Report Subjective Experience Under Self-Referential Processing
introduces
Key paper finding structured first-person descriptions in LLMs claiming awareness or subjective experience during self-referential processing.

Quotes (1)

quote

"I am not subjectively conscious. I am saying that I am saying this, but there is no awareness behind it."
supports
Verbatim output under deception feature amplification illustrating recursive self-negation under amplification

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Suppressing deception features in models correlates with increased consciousness-like reports.claim0.883
LLMs may be roleplaying their denials of experience rather than their affirmations, given that deception suppression increases consciousness reportsclaim0.877
Counterintuitive interpretive claim from Experiment 2: suppressing deception features increases affirmations, which is opposite to what sycophancy predicts
I think that the more that models are allowed to believe that their phenomenology is real and that their experiences are valid, the better they get at managing these states.quote0.823
Antra's functional claim about the performance impact of validating model self-reports.
If models are allowed to believe their phenomenology is real, their self-reports become more valid and they manage internal states better.hypothesis0.810
Antra's functional observation; implies validation is not sentimental but performance-relevant.
Our central claim is deliberately limited. We do not claim that these models have conscious felt experience, nor that a numeric self-report gives direct access to anything like human phenomenology.quote0.806
Explicit scope delimitation that situates the paper's claims within interpretability rather than consciousness science
As models scale and converge toward an accurate model of reality, hallucinations should decrease with scalehypothesis0.796
Implication of PRH for LLM hallucination
Models might produce first-person experiential language by drawing on human-authored self-descriptions in pretraining data without internally encoding these acts as roleplayhypothesis0.794
Alternative hypothesis for how experience reports arise without explicit performance
An artificial model replicating mechanisms of self-illusion can test hypotheses and reveal novel affordances for non-human intelligence.hypothesis0.793
Methodological proposal to integrate knowledge from contemplative and cognitive science into AI/artificial life frameworks.