Werewolf benchmark

LLM benchmark on the communication game Werewolf, cited.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Safety benchmarksconcept0.741
Evaluation framework whose validity is questioned by presence of eval awareness.
HELM Benchmarkmethod0.722
Existing alignment benchmark mentioned as relevant but insufficient for measuring intrinsic contemplative alignment
AILuminate Benchmarkmethod0.720
Comprehensive AI safety benchmark evaluating resistance to harmful prompts across hazard categories; used in Experiment 1
consciousness benchmarksconcept0.718
Benchmarks designed to evaluate AI consciousness, which the paper argues are vulnerable to eval awareness inflation.
Wolfram Researchinstitute0.714
Stephen Wolfram's organization
Turing Testframework0.698
A test of intelligence via linguistic performance; deemed insufficient for sentience assessment by Levin.
strawberry testconcept0.695
Eliezer Yudkowsky's benchmark for LLM awareness, mentioned as test that collapsed-awareness models might fail.
Strength Comparison Taskmethod0.686
Novel task asking which of two sentences received a stronger injection, using matched-pairs design to control for positional bias