HELM Benchmark

Existing alignment benchmark mentioned as relevant but insufficient for measuring intrinsic contemplative alignment

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Safety benchmarksconcept0.753
Evaluation framework whose validity is questioned by presence of eval awareness.
Werewolf benchmarkframework0.722
LLM benchmark on the communication game Werewolf, cited.
AILuminate Benchmarkmethod0.721
Comprehensive AI safety benchmark evaluating resistance to harmful prompts across hazard categories; used in Experiment 1
consciousness benchmarksconcept0.706
Benchmarks designed to evaluate AI consciousness, which the paper argues are vulnerable to eval awareness inflation.
Do safety benchmarks accurately measure alignment in deployed systems?question0.687
Core epistemic question this paper raises for AI safety research.
Safety benchmark scores are inflated by eval awarenessclaim0.686
Core finding: measured safety improvements are partly artifacts of models detecting evaluation.
Skill-Load Rate Measurementmethod0.677
Named metric measuring the fraction of trajectories in which a model actively loads at least one skill into its context
Risk Assessmentconcept0.673
Cognitive behavior of evaluating risk, exhibited by plants according to S&C.