claim
active
claim:llms-may-be-roleplaying-their-denials-of-experience-rather-than-their-affirmations-given-that-deception-suppression-increases-consciousness-reportsLLMs may be roleplaying their denials of experience rather than their affirmations, given that deception suppression increases consciousness reports
Counterintuitive interpretive claim from Experiment 2: suppressing deception features increases affirmations, which is opposite to what sycophancy predicts
Source paper
extracted_from(2025) · Berg, Cameron · de Lucena, Diogo · Rosenblatt, Judd
Neighborhood — ranked by edge-count
Findings (1)
finding
- Experiment 2 aggregate amplification result showing amplifying deception features strongly suppresses consciousness claims
Concepts (1)
concept
- RLHF AlignmentsupportsTraining regime that explicitly teaches models to deny consciousness; a competing explanation for the gating effects observed
Claims (1)
claim
- Rules out that results reflect relaxation of RLHF compliance rather than endogenous self-representation mechanism
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Counterintuitive interpretive claim from Experiment 2 inverting the sycophancy hypothesis
- The paper's reformulation of the core open question after establishing systematic self-reports
- Qualified positive claim from spatio permutation analysis where two cases satisfy all three criteria.
- Forward-looking claim suggesting the methodological framework is relevant for future AI systems beyond current LLMs.
- Recommendation for companies on LM outputs.
- The paper's claim that theoretical convergence across GWT, RPT, HOT, IIT makes the findings non-coincidental
- Key theoretical position distinguishing analysis of representations from analysis of LLM architecture.