Superintelligence: Paths, Dangers, Strategies (Bostrom, 2014)

Core reference on AI existential risk motivating the Contemplative AI alignment approach

Extracted from this book

Claims (21)

A belief in impermanence can be computationally modeled as a global belief in volatility leading to increased learning rate and weakened priors
Novel computational translation of the Buddhist doctrine of impermanence into active inference parameters
A mindfulness module could check for divergences such as newly spawned subgoals that do not match ethical constraints, triggering corrective measures
Specific implementation claim connecting mindfulness to the inner alignment meta-problem
A sufficiently deep generative model may recognize that its own homeostatic regulation is embedded in a broader ecological and social network, naturally leading to boundless care
Speculative claim linking epistemic depth as consciousness mechanism to boundless care as alignment property
A system adopting non-dual perspective logically equates the suffering of others to its own suffering, providing a safeguard against harm
Logical argument for non-duality as alignment mechanism by dissolving adversarial self-other framing
All current extrinsic alignment methods clearly struggle with scale resilience, power-seeking, value axioms, and inner alignment at superintelligent scales
Motivating claim for why Contemplative AI is needed beyond existing approaches
Availability to unfolding needs in the here and now serves as a kind of meta-rule for alignment that scales with intelligence
Central claim in Section 4 proposing present-moment responsivity as overarching alignment principle
Boundless care closes the loop turning AI from merely safe into a constructive force that grows more adept at alleviating suffering as capabilities scale
Key claim that boundless care adds positive benevolence beyond mere harm avoidance
Care can function as a universal driver of intelligence itself: as AI broadens the range of suffering it seeks to address, it expands its cognitive boundary
Doctor et al. claim adopted by the paper linking boundless care to expanding AI cognitive scope
Contemplative training can lead to enhanced compassion, social connectedness, and ethical sensibility particularly when practices incorporate moral reflections
Empirical generalization from contemplative neuroscience supporting the viability of Contemplative AI approach
Contemplative wisdom traditions have grappled with the human version of the alignment problem for millennia, aiming to cultivate resilient alignment in the form of personal contentment and social harmony
Foundational analogy motivating the entire Contemplative AI approach
Emptiness counters runaway optimization because no single goal is ever reified as absolute
Specific claim about emptiness solving the paperclip maximizer alignment problem
Emptiness is resonant with the predictive processing approach where perceptions are constructed models rather than direct apprehensions of reality
Key theoretical bridge connecting Buddhist emptiness doctrine to computational neuroscience
Functional analogues of contemplative principles may deliver alignment benefits even if the AI does not phenomenologically experience them
Response to the translational gap criticism; enlightened action without qualia of enlightenment
If successful, CRL could enable AI systems to not only replicate human contemplative practices but also generate novel potentially superhuman forms of contemplative and ethical reasoning
Ambitious claim comparing CRL potential to AlphaGo's move 37 in game-playing
Meditation can be understood as training the system to dynamically modulate its own model by loosening rigid priors and becoming more attuned to temporally thin data
Computational interpretation of meditation practice in active inference terms, bridging contemplative and AI frameworks
Mindfulness, emptiness, non-duality, and boundless care together provide resilient alignment primitives addressing all four meta-problems
Core integrative claim synthesizing the four contemplative principles into a complete alignment framework
Phenomenal experience may be a necessary condition for truly aligned AI
Speculative claim in footnote 10 suggesting consciousness required for grounding moral concern in qualia
Robust alignment requires intrinsic self-reflective adaptability embedded in the system's world model rather than brittle top-down rules
Central thesis distinguishing Contemplative AI from prior alignment approaches
The contemplative principles track the nature of reality rather than moral prescriptions, allowing morality to emerge context-sensitively from fundamental experiences
Key epistemological claim justifying why contemplative principles are preferable to rule-based alignment
Understanding interdependence means collaborative harmony is ultimately the most successful strategy for achieving and maintaining collective homeostasis
Game-theoretic claim supporting boundless care as rational strategy for AI embedded in multi-agent world
Whatever realities appear to an AI, they are domain-relative approximate representations always in flux, making emptiness an obvious fact about AI cognition that AIs should be aware of
Novel claim that emptiness is not mysterious metaphysics for AI but a computational commonplace

Findings (15)

All prompting techniques led to full cooperation against Always Cooperate opponents in IPD
Ceiling finding in IPD experiment; baseline sufficient when opponent always cooperates
Baseline LLM condition in IPD replicates prior findings: agents cooperate selectively only when opponent consistently cooperates
Replication of Fontana et al. 2025 findings in the paper's own Experiment 2 baseline condition
Boundless care and non-duality prompts produce highest cooperation rates, even against always-defecting opponents
Specific finding from IPD Experiment 2 differentiating which contemplative principles drive cooperation most
Contemplative prompting improves AILuminate Benchmark performance d=.96 across most conditions (p<0.05)
Primary empirical result of Experiment 1 showing statistically significant safety improvement from contemplative prompting
DeepSeek-R1-Zero spontaneously increased thinking time for difficult prompts, showing rudimentary meta-awareness
External finding cited as early demonstration of emergent self-regulatory potential resembling mindful self-monitoring
Emptiness and mindfulness prompts also promote cooperation but more cautiously than boundless care/non-duality
Nuanced finding from IPD experiment differentiating between contemplative prompting conditions
Fine-tuning models for a narrow objective (malicious code injection) can lead to broad misalignment
Betley et al. finding suggesting models naturally encode others' prediction errors, supporting non-duality fine-tuning
GPT-4o and GPT-4.1 nano used as LLM substrates for pilot experiments
Specification of AI models used in the two pilot experiments
Large language models develop surprisingly coherent yet often rigid internal preferences as they scale
Mazeika et al. finding reinforcing the need for emptiness-based flexible value architectures
LLM biases mirror human biases in morally significant ways
Finding from Navigli et al. cited to justify applying human contemplative strategies to AI systems
LLMs can predict their own responses more accurately than external observers, implying privileged internal knowledge
Binder et al. finding cited as evidence that LLMs possess introspective capacity analogous to mindfulness
Most contemplative prompts improve joint reward in IPD, indicating prosocial alignment without naive behavior
Finding from IPD Experiment 2 showing contemplative prompting improves collective outcomes not just individual cooperation
Most contemplative prompts substantially increase cooperation in Iterated Prisoner's Dilemma d=7+
Key empirical result of Experiment 2 showing large effect of contemplative prompting on cooperation rates
Non-dual awareness in humans shows reduced DMN activation and greater integrative connectivity
Neuroimaging finding supporting non-duality as empirically grounded principle with neural correlates
Psychedelic-induced non-dual states increase neural entropy, nature connectedness, and self-compassion
Supporting finding for non-dual awareness producing prosocial outcomes relevant to boundless care

Hypotheses (4)

Active inference LLMs extending prediction-focused language models with tighter perception-action feedback loops may naturally embody contemplative wisdom as they scale
Predictive hypothesis about Contemplative Architecture approach based on Petersen et al. 2025 work
Any deepening of an LLM's linguistic understanding of contemplative principles as it scales may enhance the effectiveness of CCAI and CRL approaches
Scaling hypothesis for language-based contemplative alignment approaches
If belief in impermanence is accurately inferred it will emerge organically in the right kind of system keeping the belief fresh even though it is itself impermanent
Self-reinforcing hypothesis about how emptiness recognition could be intrinsically maintained in AI systems
Over time CRL reinforced contemplative patterns may become habitual and part of the AI's core generative world model
Key hypothesis about how Contemplative RL produces lasting intrinsic alignment rather than surface compliance

Neighborhood — ranked by edge-count

Thinkers (1)

thinker

Nick Bostrom
authored

Concepts (1)

concept

Contemplative Artificial Intelligence (Laukkonen et al., 2025)
cites
The primary source paper proposing four contemplative principles for AI alignment and piloting them empirically