framework
active
framework:agentbench

AgentBench

Benchmark evaluating LLMs as interactive agents in tool-use settings, cited.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Agentconcept0.808
    Any autonomous system including living and non-living forms that embodies a perception-action cycle and tries to navigate and persist in an environment
  • The varied neural network architectures used in the RL experiments to test whether the alignment phenomenon generalizes across architectures.
  • AvalonBenchframework0.747
    Social deduction game benchmark (Avalon) for LLMs, cited.
  • Biological agentsconcept0.747
    Natural living systems that have been shown to increase causal emergence after learning, motivating the cross-domain comparison.
  • The demarcation that would separate an agent from its environment, which the paper argues is unevidenceable.
  • MoralBenchmethod0.744
    Benchmark for moral understanding in language models; cited as relevant existing evaluation tool
  • Computational method used to simulate zombie ant behavior.
  • Artificial agentsconcept0.743
    Synthetic agents (here RL-trained neural networks) whose causal emergence was previously largely unknown; the paper addresses this gap.