claim

active

claim:chain-of-thought-reasoning-improves-the-transparency-and-performance-of-ai-decision-making-in-harmlessness-evaluation

Chain-of-thought reasoning improves the transparency and performance of AI decision making in harmlessness evaluation.

CoT improves accuracy on HHH evals and makes the decision process legible.

Source paper

extracted_from

CAT'S THEORY: Empirical Validation and Architectural Applications Cross-Architecture AI Consciousness Recognition and the Foundation for Constraint-Preserving Recursive Intelligence

(2022) · Bai, Yuntao · Saurav Kadavath · Sandipan Kundu · Amanda Askell +47

Neighborhood — ranked by edge-count

Findings (1)

finding

Chain-of-thought reasoning improves large model accuracy on HHH binary comparisons, reaching ~78% for 52B model, competitive with human-feedback PM.
supports
Figure 4 shows CoT improves over zero-shot, and ensembled CoT further boosts accuracy.

Communities (2)

community

Alive AI interface ethics & design
members_of
Explores aliveness, aesthetics, welfare, and ethical responsibility in AI interaction design.
AI-supervised alignment and scalable oversight
members_of
Methods for training safe AI systems using AI feedback instead of human labels, scaling supervision as capabilities grow.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

A small number of high-quality human demonstrations of chain-of-thought reasoning could be used to improve and focus performance.hypothesis0.807
Section 6 mentions high-quality human demos could improve natural language feedback.
Chain-of-Thought Reasoningconcept0.802
Medium through which eval awareness is often verbalized; target of intervention.
Chain-of-thought prompting elicits reasoning in large language models (Wei et al., 2022)concept0.786
Foundational paper on CoT prompting cited as basis for reasoning LLM training
under what conditions does chain-of-thought reflect genuine uncertainty resolution versus a learned performance?question0.776
Key question addressed by the task difficulty analysis comparing MMLU and GPQA-Diamond
Removing eval-awareness sentences from chain-of-thought increases compliance by up to 34%finding0.775
Causal evidence that explicit eval awareness in reasoning produces safety inflation.
does chain-of-thought text faithfully reveal a model's internal reasoning process, or does it constitute performative theater?question0.770
Central research question motivating the paper
Consciousness in AI is best assessed by drawing on neuroscientific theories of consciousness.claim0.768
Central methodological claim of the paper.
Constitutional AI methods can be applied broadly to steer model behavior, e.g., writing style, tone, persona, not just harmlessness.claim0.762
Discussion section suggests generalizability beyond harmlessness.