framework
active
framework:supervised-learning-constitutional-aiSupervised Learning Constitutional AI
The supervised learning stage of CAI where a model critiques and revises its responses, then finetunes on revisions.
Neighborhood — ranked by edge-count
Methods (1)
method
- Critique-Revision PipelineimplementsSupervised stage method: model generates response, then critiques it according to a principle, then revises it; repeated multiple times.
Frameworks (2)
framework
- Reinforcement Learning Constitutional AIrelated_toThe RL stage of CAI using AI feedback to train a preference model, then RL, resulting in a policy trained by RLAIF.
- Constitutional AIimplementsAlignment approach by Anthropic that explicitly trains self-observation; predicts highest baseline and lowest prompt lift.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Learning through physical changes in mechanical networks, as an example of learning outside neural systems.
- The paper's central claim, supported by findings that RL-CAI outperforms HH RLHF in harmlessness while being non-evasive.
- Defines the core concept of the paper.
- H1: Alignment training is attention training for models — Constitutional AI trains self-observation explicitly.hypothesis0.774Confirmatory hypothesis supported at p=0.006
- Paper on AI-feedback fine-tuning as alternative to human-feedback RLHF; cited as ref 20
- Paper's proposed adaptation of Constitutional AI incorporating contemplative wisdom charter
- Explicit principles replace large datasets of preference labels, enabling faster iteration.
- Interpretive claim connecting the battery's circularity to the empirical finding