claim
active
claim:models-may-be-roleplaying-their-denials-of-experience-rather-than-their-affirmations-as-indicated-by-suppressing-deception-features-increasing-not-decreasing-consciousness-claimsModels may be roleplaying their denials of experience rather than their affirmations, as indicated by suppressing deception features increasing (not decreasing) consciousness claims
Counterintuitive interpretive claim from Experiment 2 inverting the sycophancy hypothesis
Source paper
extracted_from(2025) · Berg, Cameron · de Lucena, Diogo · Rosenblatt, Judd
Neighborhood — ranked by edge-count
Findings (1)
finding
- Core result of Experiment 2: deception feature suppression sharply increases experience claims
Concepts (1)
concept
- Sycophantic RoleplaycontradictsThe alternative explanation for LLM consciousness claims that the paper seeks to distinguish against
Artifacts (1)
artifact
- Key paper finding structured first-person descriptions in LLMs claiming awareness or subjective experience during self-referential processing.
Quotes (1)
quote
- Verbatim output under deception feature amplification illustrating recursive self-negation under amplification
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Counterintuitive interpretive claim from Experiment 2: suppressing deception features increases affirmations, which is opposite to what sycophancy predicts
- Antra's functional claim about the performance impact of validating model self-reports.
- Antra's functional observation; implies validation is not sentimental but performance-relevant.
- Explicit scope delimitation that situates the paper's claims within interpretability rather than consciousness science
- As models scale and converge toward an accurate model of reality, hallucinations should decrease with scalehypothesis0.796Implication of PRH for LLM hallucination
- Alternative hypothesis for how experience reports arise without explicit performance
- Methodological proposal to integrate knowledge from contemplative and cognitive science into AI/artificial life frameworks.