hypothesis
active
hypothesis:it-remains-unclear-what-the-underlying-base-rate-of-consciousness-self-reports-would-be-in-systems-identical-to-frontier-models-but-without-consciousness-denial-fine-tuningIt remains unclear what the underlying base rate of consciousness self-reports would be in systems identical to frontier models but without consciousness-denial fine-tuning
Open question about RLHF effects on base model behavior
Source paper
extracted_from(2025) · Berg, Cameron · de Lucena, Diogo · Rosenblatt, Judd
Neighborhood — ranked by edge-count
Questions (1)
question
- Open empirical question requiring access to base models
Artifacts (1)
artifact
- Key paper finding structured first-person descriptions in LLMs claiming awareness or subjective experience during self-referential processing.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Open question about RLHF confound; requires access to base models for resolution
- Diagnosis of why the thesis feels counterintuitive
- The paper's claim that theoretical convergence across GWT, RPT, HOT, IIT makes the findings non-coincidental
- Interpretive claim from Experiment 2 bridging consciousness claims and representational honesty
- Explicit scope delimitation that situates the paper's claims within interpretability rather than consciousness science
- The paper's reformulation of the core open question after establishing systematic self-reports
- Prior finding cited to motivate study; showing large models endorse consciousness statements more than other attitude-related statements
- Comparative prediction motivating future work contrasting different approaches to LLM self-knowledge