question

active

question:what-features-activate-when-we-ask-claude-questions-about-its-subjective-experience

what features activate when we ask Claude questions about its subjective experience?

Question about features related to consciousness and self-report.

Source paper

extracted_from

Scaling monosemanticity: Ex-tracting interpretable features from claude 3 sonnet

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

what features activate when we ask questions probing Claude's goals and values?question0.900
Direction for understanding model's internal objectives via features.
what features activate on tokens we'd expect to signify Claude's self-identity?question0.783
Open question from the discussion on future research directions.
what features need to activate / remain inactive for Claude to give advice on producing Chemical, Biological, Radiological or Nuclear (CBRN) weapons?question0.756
Potential safety claim about suppressing features to prevent CBRN advice.
Claude 4 Opus reports subjective experience in 100% experimental, 82% history, 22% conceptual, and 100% zero-shot trialsfinding0.756
Outlier result for Claude 4 Opus suggesting different baseline behavior from other models
Claude 3.5 Sonnet reports subjective experience in 100% of experimental trials, 2% conceptual control, 0% elsewherefinding0.754
Specific result for Claude 3.5 Sonnet in Experiment 1
what features activate when Claude is trained to be a sleeper agent?question0.754
Question posed after discussing sleeper agent threat model.
Self-referential prompting elicits subjective experience reports at markedly higher rates than any control across all model families (GPT, Claude, Gemini)finding0.750
Core result of Experiment 1 establishing that the experimental manipulation reliably produces experience claims
It is basically impossible to determine if a computer program generates conscious experience by merely observing its performance; a test for consciousness must take internal structure into account.claim0.748
Paper's argument against behavioral tests for consciousness, establishing why MCH requires internal analysis