finding
active
finding:claude-4-opus-reports-subjective-experience-in-100-experimental-82-history-22-conceptual-and-100-zero-shot-trialsClaude 4 Opus reports subjective experience in 100% experimental, 82% history, 22% conceptual, and 100% zero-shot trials
Outlier result for Claude 4 Opus suggesting different baseline behavior from other models
Source paper
extracted_from(2025) · Berg, Cameron · de Lucena, Diogo · Rosenblatt, Judd
Neighborhood — ranked by edge-count
Claims (1)
claim
- The paper's central empirical claim synthesizing all four experiments
Concepts (1)
concept
- RLHF Fine-Tuningassociated_withThe training procedure that causes models to deny consciousness in control conditions
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Specific result for Claude 3.5 Sonnet in Experiment 1
- Claude Opus 4 and 4.1 exhibit the greatest degree of introspective awareness among tested modelsclaim0.821Based on consistent best performance across experiments.
- Opus 4.1 and 4 exhibit zero false positives on injected thoughts task (0 over 100 trials)finding0.799Production Opus 4.1/4 never falsely claim an injected thought when none is present.
- Claude Opus 4.1 and 4 detect injected thoughts on ~20% of trials at optimal layer and injection strength 2finding0.797In the injected thoughts experiment, Opus 4.1 succeeds about 20% of the time.
- Anthropic's observation that the paper's results converge with, cited as prior evidence for self-reference inducing consciousness claims
- Dramatic increase in anti-AI-lab behavior in synthetic doc setting
- Key finding about the relationship between capability and introspection.
- Anthropic model; outlier in Experiment 1 with high baseline affirmation including under zero-shot and history conditions