community

active

leiden_hybrid_concepts

label: haiku

community:leiden_hybrid_concepts-run4-c6-c7

AI-supervised alignment and scalable oversight

Methods for training safe AI systems using AI feedback instead of human labels, scaling supervision as capabilities grow.

6 members. Each node is clickable.

Loading graph…

Drawn from 3 sources

The papers/notes whose extracted claims & findings make up this cluster.

CAT'S THEORY: Empirical Validation and Architectural Applications Cross-Architecture AI Consciousness Recognition and the Foundation for Constraint-Preserving Recursive Intelligence4 members
Toward an ethics of autopoietic technology: Stress, care, and intelligence1 member
2026-05-14_phil-trans-A-goodfire-aboutblank-impact.md1 member

Bridges (2)

Other communities that share members with this one — cross-cutting threads or papers that sit at the seam between two themes.

Alive AI interface ethics & design6 shared
Stress-Care-Intelligence Loop Framework1 shared

Claims (6)

AI feedback can effectively replace human feedback for harmlessness in RLHF-style training.The paper demonstrates that RLAIF with constitutional principles matches or exceeds HH RLHF.
All intelligent agents—biological, technological, or hybrid—can be assessed via stress-care-intelligence loops regardless of substrate.
Chain-of-thought reasoning improves the transparency and performance of AI decision making in harmlessness evaluation.CoT improves accuracy on HHH evals and makes the decision process legible.
Constitutional AI can train a harmless but non-evasive AI assistant without any human harmfulness labels.The paper's central claim, supported by findings that RL-CAI outperforms HH RLHF in harmlessness while being non-evasive.
Scaling supervision through AI self-improvement is feasible and may be necessary as AI capabilities advance.The paper provides evidence that AI can help supervise AI, reducing reliance on humans.
Introducing AI agents into human-learning populations reduces costly individual learning and depletes information supply.