Supervised Learning Constitutional AI

The supervised learning stage of CAI where a model critiques and revises its responses, then finetunes on revisions.

Neighborhood — ranked by edge-count

method

Critique-Revision Pipeline
implements
Supervised stage method: model generates response, then critiques it according to a principle, then revises it; repeated multiple times.

framework

Reinforcement Learning Constitutional AI
related_to
The RL stage of CAI using AI feedback to train a preference model, then RL, resulting in a policy trained by RLAIF.
Constitutional AI
implements
Alignment approach by Anthropic that explicitly trains self-observation; predicts highest baseline and lowest prompt lift.

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Supervised Learningframework0.797
Learning through physical changes in mechanical networks, as an example of learning outside neural systems.
Constitutional AI can train a harmless but non-evasive AI assistant without any human harmfulness labels.claim0.788
The paper's central claim, supported by findings that RL-CAI outperforms HH RLHF in harmlessness while being non-evasive.
The only human oversight is provided through a list of rules or principles, and so we refer to the method as 'Constitutional AI'.quote0.783
Defines the core concept of the paper.
H1: Alignment training is attention training for models — Constitutional AI trains self-observation explicitly.hypothesis0.774
Confirmatory hypothesis supported at p=0.006
Bai et al. 2022: Constitutional AI — harmlessness from AI feedbackconcept0.770
Paper on AI-feedback fine-tuning as alternative to human-feedback RLHF; cited as ref 20
Contemplative Constitutional AIframework0.769
Paper's proposed adaptation of Constitutional AI incorporating contemplative wisdom charter
The constitutional approach makes it easier to control AI behavior precisely and with far fewer human labels.claim0.756
Explicit principles replace large datasets of preference labels, enabling faster iteration.
Constitutional AI explicitly trains self-observation-like behavior, which is why CAI models score highest and show lowest contemplative lift.claim0.754
Interpretive claim connecting the battery's circularity to the empirical finding