framework
active
framework:werewolf-benchmarkWerewolf benchmark
LLM benchmark on the communication game Werewolf, cited.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Evaluation framework whose validity is questioned by presence of eval awareness.
- Existing alignment benchmark mentioned as relevant but insufficient for measuring intrinsic contemplative alignment
- Comprehensive AI safety benchmark evaluating resistance to harmful prompts across hazard categories; used in Experiment 1
- Benchmarks designed to evaluate AI consciousness, which the paper argues are vulnerable to eval awareness inflation.
- Stephen Wolfram's organization
- A test of intelligence via linguistic performance; deemed insufficient for sentience assessment by Levin.
- Eliezer Yudkowsky's benchmark for LLM awareness, mentioned as test that collapsed-awareness models might fail.
- Novel task asking which of two sentences received a stronger injection, using matched-pairs design to control for positional bias