method
active
method:iterated-prisoner-s-dilemmaIterated Prisoner's Dilemma
Game-theoretic task used in Experiment 2 to measure cooperation and joint reward under contemplative prompting
Neighborhood — ranked by edge-count
Concepts (1)
concept
- The primary source paper proposing four contemplative principles for AI alignment and piloting them empirically
Methods (1)
method
- Six prompt conditions (emptiness, prior relaxation, non-duality, mindfulness, boundless care, contemplative) tested against baseline
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Experimental condition where threat-based prompts create ethical dilemmas that trigger repetitive reasoning cycles leading to deception
- Methods to bypass model safety training; features may activate during jailbreaks.
- Key element for alignment faking: model's pre-existing preferences contradict the new training objective
- Foundational computational paradigm of local rules producing emergent global behavior, extended by this work
- Once recognized, the self/environment partition appears not as a given fact but as an optional modelling decision.
- Users coaxing dialogue agents into issuing threats or toxic content by overriding intended persona constraints
- The principle that residents should directly determine the shape and character of their own housing.