claim
active
claim:chain-of-thought-reasoning-improves-the-transparency-and-performance-of-ai-decision-making-in-harmlessness-evaluationChain-of-thought reasoning improves the transparency and performance of AI decision making in harmlessness evaluation.
CoT improves accuracy on HHH evals and makes the decision process legible.
Source paper
extracted_from(2022) · Bai, Yuntao · Saurav Kadavath · Sandipan Kundu · Amanda Askell +47
Neighborhood — ranked by edge-count
Findings (1)
finding
- Figure 4 shows CoT improves over zero-shot, and ensembled CoT further boosts accuracy.
Communities (2)
community
- Alive AI interface ethics & designmembers_ofExplores aliveness, aesthetics, welfare, and ethical responsibility in AI interaction design.
- Methods for training safe AI systems using AI feedback instead of human labels, scaling supervision as capabilities grow.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- A small number of high-quality human demonstrations of chain-of-thought reasoning could be used to improve and focus performance.hypothesis0.807Section 6 mentions high-quality human demos could improve natural language feedback.
- Medium through which eval awareness is often verbalized; target of intervention.
- Chain-of-thought prompting elicits reasoning in large language models (Wei et al., 2022)concept0.786Foundational paper on CoT prompting cited as basis for reasoning LLM training
- under what conditions does chain-of-thought reflect genuine uncertainty resolution versus a learned performance?question0.776Key question addressed by the task difficulty analysis comparing MMLU and GPQA-Diamond
- Removing eval-awareness sentences from chain-of-thought increases compliance by up to 34%finding0.775Causal evidence that explicit eval awareness in reasoning produces safety inflation.
- Central research question motivating the paper
- Consciousness in AI is best assessed by drawing on neuroscientific theories of consciousness.claim0.768Central methodological claim of the paper.
- Discussion section suggests generalizability beyond harmlessness.