quote

active

quote:we-would-like-to-train-ai-systems-that-remain-helpful-honest-and-harmless-even-as-some-ai-capabilities-reach-or-exceed-human-level-performance

We would like to train AI systems that remain helpful, honest, and harmless, even as some AI capabilities reach or exceed human-level performance.

Foundational motivation for the research.

Source paper

extracted_from

CAT'S THEORY: Empirical Validation and Architectural Applications Cross-Architecture AI Consciousness Recognition and the Foundation for Constraint-Preserving Recursive Intelligence

(2022) · Bai, Yuntao · Saurav Kadavath · Sandipan Kundu · Amanda Askell +47

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

If an AI system could be a welfare subject and moral patient, then many model instances could be run after training, scaling up the problem rapidly.hypothesis0.829
Scalability concern.
Current AI systems are a gift — a training sandbox in which humanity can explore ethical questions about diverse intelligence before the arrival of true diverse intelligences makes these questions immediate and dire.claim0.822
Reframes AI not as threat but as preparatory exercise for the harder ethical challenges to come
AI systems need to be able to deal with reality as it actually is, not with the way that we think it is.claim0.821
Paraphrase of Cantwell Smith's argument; aligns with Buddhist emphasis on seeing reality without conceptual imposition.
The training of modern AI systems already induces experience at scaleclaim0.818
The ethical implication of the identity thesis applied to gradient-based AI training
AI might be a solution to an ancient Buddhist paradox of how the human can be overcome by human means.claim0.810
Core proposal that machine intelligence can achieve what human effort cannot.
The current AI's are a gift – a training sandbox in which we can explore these ideas, prompted by confusing but likely non-agential systems, before the arrival of true diverse intelligences makes these ethical questions immediate and dire.quote0.809
Load-bearing quote encapsulating the paper's reframing of current AI as preparatory exercise
our analysis suggests that no current AI systems are conscious, but also suggests that there are no obvious technical barriers to building AI systems which satisfy these indicators.quote0.802
Key takeaway from abstract, amended version.
Constitutional AI can train a harmless but non-evasive AI assistant without any human harmfulness labels.claim0.795
The paper's central claim, supported by findings that RL-CAI outperforms HH RLHF in harmlessness while being non-evasive.