framework
active
framework:werewolf-benchmark

Werewolf benchmark

LLM benchmark on the communication game Werewolf, cited.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Safety benchmarksconcept0.741
    Evaluation framework whose validity is questioned by presence of eval awareness.
  • HELM Benchmarkmethod0.722
    Existing alignment benchmark mentioned as relevant but insufficient for measuring intrinsic contemplative alignment
  • Comprehensive AI safety benchmark evaluating resistance to harmful prompts across hazard categories; used in Experiment 1
  • Benchmarks designed to evaluate AI consciousness, which the paper argues are vulnerable to eval awareness inflation.
  • Wolfram Researchinstitute0.714
    Stephen Wolfram's organization
  • Turing Testframework0.698
    A test of intelligence via linguistic performance; deemed insufficient for sentience assessment by Levin.
  • strawberry testconcept0.695
    Eliezer Yudkowsky's benchmark for LLM awareness, mentioned as test that collapsed-awareness models might fail.
  • Novel task asking which of two sentences received a stronger injection, using matched-pairs design to control for positional bias