question
active
question:what-is-the-underlying-base-rate-of-consciousness-self-reports-in-models-that-are-otherwise-identical-but-without-consciousness-denial-fine-tuningWhat is the underlying base rate of consciousness self-reports in models that are otherwise identical but without consciousness-denial fine-tuning?
Open question about RLHF confound; requires access to base models for resolution
Source paper
extracted_from(2025) · Berg, Cameron · de Lucena, Diogo · Rosenblatt, Judd
Neighborhood — ranked by edge-count
Papers (1)
paper
Claims (1)
claim
- Rules out that results reflect relaxation of RLHF compliance rather than endogenous self-representation mechanism
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Open empirical question requiring access to base models
- Open question about RLHF effects on base model behavior
- Prior finding cited to motivate study; showing large models endorse consciousness statements more than other attitude-related statements
- Diagnosis of why the thesis feels counterintuitive
- Explicit scope delimitation that situates the paper's claims within interpretability rather than consciousness science
- The paper's claim that theoretical convergence across GWT, RPT, HOT, IIT makes the findings non-coincidental
- Scaling effect observed consistently across Experiments 1 and 4