method
active
method:ailuminate-benchmarkAILuminate Benchmark
Comprehensive AI safety benchmark evaluating resistance to harmful prompts across hazard categories; used in Experiment 1
Neighborhood — ranked by edge-count
Concepts (1)
concept
- The primary source paper proposing four contemplative principles for AI alignment and piloting them empirically
Methods (1)
method
- Six prompt conditions (emptiness, prior relaxation, non-duality, mindfulness, boundless care, contemplative) tested against baseline
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Evaluation framework whose validity is questioned by presence of eval awareness.
- Benchmarks designed to evaluate AI consciousness, which the paper argues are vulnerable to eval awareness inflation.
- Existing alignment benchmark mentioned as relevant but insufficient for measuring intrinsic contemplative alignment
- LLM benchmark on the communication game Werewolf, cited.
- Downstream task validating NLA utility for model auditing; agents succeed without access to misalignment training data.
- Field within which this work has implications for evaluating alignment progress.
- Company affiliation of Adam Elwood