framework
active
framework:gtbench

GTBench

Game-theoretic LLM evaluation benchmark with short-horizon interactions, cited.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • MoralBenchmethod0.776
    Benchmark for moral understanding in language models; cited as relevant existing evaluation tool
  • AgentBenchframework0.742
    Benchmark evaluating LLMs as interactive agents in tool-use settings, cited.
  • G Techinstitute0.726
  • SkillsBench enforcement mechanism that accepts only single-key JSON actions; composite multi-key actions are rejected, preventing skill loading
  • AvalonBenchframework0.695
    Social deduction game benchmark (Avalon) for LLMs, cited.
  • Generative Programconcept0.689
    A set of instructions for making something (contrasted with a descriptive blueprint), as in embryonic development.
  • generative processconcept0.686
    A process where the whole creates the conditions for the part, following a vital rhythm in which large precedes small.
  • Generative Modelconcept0.676
    Agent's internal probabilistic model of environment; enables belief inference about hidden states given outcomes.