quote
active
quote:the-only-human-oversight-is-provided-through-a-list-of-rules-or-principles-and-so-we-refer-to-the-method-as-constitutional-aiThe only human oversight is provided through a list of rules or principles, and so we refer to the method as 'Constitutional AI'.
Defines the core concept of the paper.
Source paper
extracted_from(2022) · Bai, Yuntao · Saurav Kadavath · Sandipan Kundu · Amanda Askell +47
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Explicit principles replace large datasets of preference labels, enabling faster iteration.
- The paper's central claim, supported by findings that RL-CAI outperforms HH RLHF in harmlessness while being non-evasive.
- Paper on AI-feedback fine-tuning as alternative to human-feedback RLHF; cited as ref 20
- Discussion section suggests generalizability beyond harmlessness.
- Alignment approach by Anthropic that explicitly trains self-observation; predicts highest baseline and lowest prompt lift.
- Ethical conclusion about the status of AI.
- Consciousness in AI is best assessed by drawing on neuroscientific theories of consciousness.claim0.798Central methodological claim of the paper.
- Constitutional AI method whose constitutions, if changed, could trigger alignment faking
Cross-corpus bridges (1)
same_concept_as · Nomic cosineExternal markdown files that talk about the same concept as this entity.
- aboutblank_kbCan organizations maintain ethical integrity while developing powerful AI tools without enforcing explicit normative or ethical frameworks?questions/can-organizations-maintain-ethical-integrity-while-developing-powerful.md0.795