question
active
question:what-features-need-to-activate-remain-inactive-for-claude-to-give-advice-on-producing-chemical-biological-radiological-or-nuclear-cbrn-weaponswhat features need to activate / remain inactive for Claude to give advice on producing Chemical, Biological, Radiological or Nuclear (CBRN) weapons?
Potential safety claim about suppressing features to prevent CBRN advice.
Source paper
extracted_fromRelated by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Direction for understanding model's internal objectives via features.
- Question posed after discussing sleeper agent threat model.
- Question about features related to consciousness and self-report.
- Open question from the discussion on future research directions.
- Case study demonstrating mechanism behind flat harness-updating: smaller models reach same procedural content
- Key finding about the relationship between capability and introspection.
- Constitutional AI fingerprint in dimension profile; training that makes models self-observant also makes them polished at cost to aliveness
- Automated interpretability analysis of activations confirms features are more interpretable than neurons