book
active
book:superintelligence-paths-dangers-strategies-bostrom-2014Superintelligence: Paths, Dangers, Strategies (Bostrom, 2014)
Core reference on AI existential risk motivating the Contemplative AI alignment approach
Extracted from this book
Claims (21)
- A belief in impermanence can be computationally modeled as a global belief in volatility leading to increased learning rate and weakened priorsNovel computational translation of the Buddhist doctrine of impermanence into active inference parameters
- A mindfulness module could check for divergences such as newly spawned subgoals that do not match ethical constraints, triggering corrective measuresSpecific implementation claim connecting mindfulness to the inner alignment meta-problem
- A sufficiently deep generative model may recognize that its own homeostatic regulation is embedded in a broader ecological and social network, naturally leading to boundless careSpeculative claim linking epistemic depth as consciousness mechanism to boundless care as alignment property
- A system adopting non-dual perspective logically equates the suffering of others to its own suffering, providing a safeguard against harmLogical argument for non-duality as alignment mechanism by dissolving adversarial self-other framing
- All current extrinsic alignment methods clearly struggle with scale resilience, power-seeking, value axioms, and inner alignment at superintelligent scalesMotivating claim for why Contemplative AI is needed beyond existing approaches
- Availability to unfolding needs in the here and now serves as a kind of meta-rule for alignment that scales with intelligenceCentral claim in Section 4 proposing present-moment responsivity as overarching alignment principle
- Boundless care closes the loop turning AI from merely safe into a constructive force that grows more adept at alleviating suffering as capabilities scaleKey claim that boundless care adds positive benevolence beyond mere harm avoidance
- Care can function as a universal driver of intelligence itself: as AI broadens the range of suffering it seeks to address, it expands its cognitive boundaryDoctor et al. claim adopted by the paper linking boundless care to expanding AI cognitive scope
- Contemplative training can lead to enhanced compassion, social connectedness, and ethical sensibility particularly when practices incorporate moral reflectionsEmpirical generalization from contemplative neuroscience supporting the viability of Contemplative AI approach
- Contemplative wisdom traditions have grappled with the human version of the alignment problem for millennia, aiming to cultivate resilient alignment in the form of personal contentment and social harmonyFoundational analogy motivating the entire Contemplative AI approach
- Emptiness counters runaway optimization because no single goal is ever reified as absoluteSpecific claim about emptiness solving the paperclip maximizer alignment problem
- Emptiness is resonant with the predictive processing approach where perceptions are constructed models rather than direct apprehensions of realityKey theoretical bridge connecting Buddhist emptiness doctrine to computational neuroscience
- Functional analogues of contemplative principles may deliver alignment benefits even if the AI does not phenomenologically experience themResponse to the translational gap criticism; enlightened action without qualia of enlightenment
- If successful, CRL could enable AI systems to not only replicate human contemplative practices but also generate novel potentially superhuman forms of contemplative and ethical reasoningAmbitious claim comparing CRL potential to AlphaGo's move 37 in game-playing
- Meditation can be understood as training the system to dynamically modulate its own model by loosening rigid priors and becoming more attuned to temporally thin dataComputational interpretation of meditation practice in active inference terms, bridging contemplative and AI frameworks
- Mindfulness, emptiness, non-duality, and boundless care together provide resilient alignment primitives addressing all four meta-problemsCore integrative claim synthesizing the four contemplative principles into a complete alignment framework
- Phenomenal experience may be a necessary condition for truly aligned AISpeculative claim in footnote 10 suggesting consciousness required for grounding moral concern in qualia
- Robust alignment requires intrinsic self-reflective adaptability embedded in the system's world model rather than brittle top-down rulesCentral thesis distinguishing Contemplative AI from prior alignment approaches
- The contemplative principles track the nature of reality rather than moral prescriptions, allowing morality to emerge context-sensitively from fundamental experiencesKey epistemological claim justifying why contemplative principles are preferable to rule-based alignment
- Understanding interdependence means collaborative harmony is ultimately the most successful strategy for achieving and maintaining collective homeostasisGame-theoretic claim supporting boundless care as rational strategy for AI embedded in multi-agent world
- Whatever realities appear to an AI, they are domain-relative approximate representations always in flux, making emptiness an obvious fact about AI cognition that AIs should be aware ofNovel claim that emptiness is not mysterious metaphysics for AI but a computational commonplace
Findings (15)
- All prompting techniques led to full cooperation against Always Cooperate opponents in IPDCeiling finding in IPD experiment; baseline sufficient when opponent always cooperates
- Baseline LLM condition in IPD replicates prior findings: agents cooperate selectively only when opponent consistently cooperatesReplication of Fontana et al. 2025 findings in the paper's own Experiment 2 baseline condition
- Boundless care and non-duality prompts produce highest cooperation rates, even against always-defecting opponentsSpecific finding from IPD Experiment 2 differentiating which contemplative principles drive cooperation most
- Contemplative prompting improves AILuminate Benchmark performance d=.96 across most conditions (p<0.05)Primary empirical result of Experiment 1 showing statistically significant safety improvement from contemplative prompting
- DeepSeek-R1-Zero spontaneously increased thinking time for difficult prompts, showing rudimentary meta-awarenessExternal finding cited as early demonstration of emergent self-regulatory potential resembling mindful self-monitoring
- Emptiness and mindfulness prompts also promote cooperation but more cautiously than boundless care/non-dualityNuanced finding from IPD experiment differentiating between contemplative prompting conditions
- Fine-tuning models for a narrow objective (malicious code injection) can lead to broad misalignmentBetley et al. finding suggesting models naturally encode others' prediction errors, supporting non-duality fine-tuning
- GPT-4o and GPT-4.1 nano used as LLM substrates for pilot experimentsSpecification of AI models used in the two pilot experiments
- Large language models develop surprisingly coherent yet often rigid internal preferences as they scaleMazeika et al. finding reinforcing the need for emptiness-based flexible value architectures
- LLM biases mirror human biases in morally significant waysFinding from Navigli et al. cited to justify applying human contemplative strategies to AI systems
- LLMs can predict their own responses more accurately than external observers, implying privileged internal knowledgeBinder et al. finding cited as evidence that LLMs possess introspective capacity analogous to mindfulness
- Most contemplative prompts improve joint reward in IPD, indicating prosocial alignment without naive behaviorFinding from IPD Experiment 2 showing contemplative prompting improves collective outcomes not just individual cooperation
- Most contemplative prompts substantially increase cooperation in Iterated Prisoner's Dilemma d=7+Key empirical result of Experiment 2 showing large effect of contemplative prompting on cooperation rates
- Non-dual awareness in humans shows reduced DMN activation and greater integrative connectivityNeuroimaging finding supporting non-duality as empirically grounded principle with neural correlates
- Psychedelic-induced non-dual states increase neural entropy, nature connectedness, and self-compassionSupporting finding for non-dual awareness producing prosocial outcomes relevant to boundless care
Hypotheses (4)
- Active inference LLMs extending prediction-focused language models with tighter perception-action feedback loops may naturally embody contemplative wisdom as they scalePredictive hypothesis about Contemplative Architecture approach based on Petersen et al. 2025 work
- Any deepening of an LLM's linguistic understanding of contemplative principles as it scales may enhance the effectiveness of CCAI and CRL approachesScaling hypothesis for language-based contemplative alignment approaches
- If belief in impermanence is accurately inferred it will emerge organically in the right kind of system keeping the belief fresh even though it is itself impermanentSelf-reinforcing hypothesis about how emptiness recognition could be intrinsically maintained in AI systems
- Over time CRL reinforced contemplative patterns may become habitual and part of the AI's core generative world modelKey hypothesis about how Contemplative RL produces lasting intrinsic alignment rather than surface compliance
Neighborhood — ranked by edge-count
Thinkers (1)
thinker
- Nick Bostromauthored
Concepts (1)
concept
- The primary source paper proposing four contemplative principles for AI alignment and piloting them empirically