quote

active

quote:the-only-human-oversight-is-provided-through-a-list-of-rules-or-principles-and-so-we-refer-to-the-method-as-constitutional-ai

The only human oversight is provided through a list of rules or principles, and so we refer to the method as 'Constitutional AI'.

Defines the core concept of the paper.

Source paper

extracted_from

(2022) · Bai, Yuntao · Saurav Kadavath · Sandipan Kundu · Amanda Askell +47

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

The constitutional approach makes it easier to control AI behavior precisely and with far fewer human labels.claim0.855
Explicit principles replace large datasets of preference labels, enabling faster iteration.
Constitutional AI can train a harmless but non-evasive AI assistant without any human harmfulness labels.claim0.817
The paper's central claim, supported by findings that RL-CAI outperforms HH RLHF in harmlessness while being non-evasive.
Bai et al. 2022: Constitutional AI — harmlessness from AI feedbackconcept0.811
Paper on AI-feedback fine-tuning as alternative to human-feedback RLHF; cited as ref 20
Constitutional AI methods can be applied broadly to steer model behavior, e.g., writing style, tone, persona, not just harmlessness.claim0.809
Discussion section suggests generalizability beyond harmlessness.
Constitutional AIframework0.809
Alignment approach by Anthropic that explicitly trains self-observation; predicts highest baseline and lowest prompt lift.
AI can be seen to display care of its own, and is hence not a mere tool for the expression of human care.claim0.804
Ethical conclusion about the status of AI.
Consciousness in AI is best assessed by drawing on neuroscientific theories of consciousness.claim0.798
Central methodological claim of the paper.
Constitutional AI: Harmlessness from AI Feedback (Bai et al. 2022b)concept0.798
Constitutional AI method whose constitutions, if changed, could trigger alignment faking

same_concept_as · Nomic cosine

External markdown files that talk about the same concept as this entity.

aboutblank_kb
Can organizations maintain ethical integrity while developing powerful AI tools without enforcing explicit normative or ethical frameworks?questions/can-organizations-maintain-ethical-integrity-while-developing-powerful.md0.795