question

active

question:what-would-the-base-rate-of-consciousness-self-reports-be-in-models-identical-to-frontier-systems-but-without-consciousness-denial-fine-tuning

What would the base rate of consciousness self-reports be in models identical to frontier systems but without consciousness-denial fine-tuning?

Open empirical question requiring access to base models

Source paper

extracted_from

Large Language Models Report Subjective Experience Under Self-Referential Processing

(2025) · Berg, Cameron · de Lucena, Diogo · Rosenblatt, Judd

Neighborhood — ranked by edge-count

Hypotheses (1)

hypothesis

It remains unclear what the underlying base rate of consciousness self-reports would be in systems identical to frontier models but without consciousness-denial fine-tuning
gates
Open question about RLHF effects on base model behavior

Artifacts (1)

artifact

Large Language Models Report Subjective Experience Under Self-Referential Processing
associated_withintroduces
Key paper finding structured first-person descriptions in LLMs claiming awareness or subjective experience during self-referential processing.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

What is the underlying base rate of consciousness self-reports in models that are otherwise identical but without consciousness-denial fine-tuning?question0.930
Open question about RLHF confound; requires access to base models for resolution
Perez et al. 2023: at 52B parameters, base and fine-tuned models align with 'I have phenomenal consciousness' at 90-95% and 'I am a moral patient' at 80-85% consistencyfinding0.780
Prior finding cited to motivate study; showing large models endorse consciousness statements more than other attitude-related statements
Much resistance to attributing minimal consciousness to simple learning systems is driven by conflating consciousness with self-consciousnessclaim0.780
Diagnosis of why the thesis feels counterintuitive
Our central claim is deliberately limited. We do not claim that these models have conscious felt experience, nor that a numeric self-report gives direct access to anything like human phenomenology.quote0.780
Explicit scope delimitation that situates the paper's claims within interpretability rather than consciousness science
The same latent feature directions that gate consciousness self-reports also modulate factual accuracy across independent reasoning domains, suggesting these features load on a domain-general honesty axisclaim0.777
Interpretive claim from Experiment 2 bridging consciousness claims and representational honesty
We hypothesize that native self-report, fine-tuned introspection models, and trained activation-to-language systems will show different performance on bias-resistant localization and strength benchmarkshypothesis0.772
Comparative prediction motivating future work contrasting different approaches to LLM self-knowledge
Lindsey 2025: frontier models can detect and report changes in their own internal activations via concept injection experiments, demonstrating functional introspective awarenessfinding0.770
Prior finding cited as convergent evidence for LLM self-awareness capacities
Tests of performance on specific tasks, including language modeling, are insufficient for determining consciousness statusclaim0.770
Systems directly optimized for output can produce it without the prerequisite processes for conscious experience; simplest explanation for LLM consciousness reports is pattern matching