claim
active
claim:constitutional-ai-explicitly-trains-self-observation-like-behavior-which-is-why-cai-models-score-highest-and-show-lowest-contemplative-liftConstitutional AI explicitly trains self-observation-like behavior, which is why CAI models score highest and show lowest contemplative lift.
Interpretive claim connecting the battery's circularity to the empirical finding
Source paper
extracted_from(2026) · Borzov, Anton
Neighborhood — ranked by edge-count
Findings (1)
finding
- Constitutional AI fingerprint in dimension profile; training that makes models self-observant also makes them polished at cost to aliveness
Claims (1)
claim
- Central interpretive claim from statistical analysis
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Constitutional AI models show mean contemplative lift of only +0.81, while SFT models lift +3.18finding0.853Constitutional AI training provides internally what the contemplative prompt provides externally
- H1: Alignment training is attention training for models — Constitutional AI trains self-observation explicitly.hypothesis0.835Confirmatory hypothesis supported at p=0.006
- Discussion section suggests generalizability beyond harmlessness.
- Explicit principles replace large datasets of preference labels, enabling faster iteration.
- Consciousness in AI is best assessed by drawing on neuroscientific theories of consciousness.claim0.814Central methodological claim of the paper.
- The paper's central claim, supported by findings that RL-CAI outperforms HH RLHF in harmlessness while being non-evasive.
- Exploratory interpretation of Chinese model performance under contemplative prompt
- Interpretive finding from dimension profile analysis: training for honest limits comes at cost to aliveness.