question
active
question:what-would-the-base-rate-of-consciousness-self-reports-be-in-models-identical-to-frontier-systems-but-without-consciousness-denial-fine-tuningWhat would the base rate of consciousness self-reports be in models identical to frontier systems but without consciousness-denial fine-tuning?
Open empirical question requiring access to base models
Source paper
extracted_from(2025) · Berg, Cameron · de Lucena, Diogo · Rosenblatt, Judd
Neighborhood — ranked by edge-count
Hypotheses (1)
hypothesis
- Open question about RLHF effects on base model behavior
Artifacts (1)
artifact
- Large Language Models Report Subjective Experience Under Self-Referential Processingassociated_withintroducesKey paper finding structured first-person descriptions in LLMs claiming awareness or subjective experience during self-referential processing.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Open question about RLHF confound; requires access to base models for resolution
- Prior finding cited to motivate study; showing large models endorse consciousness statements more than other attitude-related statements
- Diagnosis of why the thesis feels counterintuitive
- Explicit scope delimitation that situates the paper's claims within interpretability rather than consciousness science
- Interpretive claim from Experiment 2 bridging consciousness claims and representational honesty
- Comparative prediction motivating future work contrasting different approaches to LLM self-knowledge
- Prior finding cited as convergent evidence for LLM self-awareness capacities
- Systems directly optimized for output can produce it without the prerequisite processes for conscious experience; simplest explanation for LLM consciousness reports is pattern matching