AgentBench

Benchmark evaluating LLMs as interactive agents in tool-use settings, cited.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Agentconcept0.808
Any autonomous system including living and non-living forms that embodies a perception-action cycle and tries to navigate and persist in an environment
Agent architecturesconcept0.751
The varied neural network architectures used in the RL experiments to test whether the alignment phenomenon generalizes across architectures.
AvalonBenchframework0.747
Social deduction game benchmark (Avalon) for LLMs, cited.
Biological agentsconcept0.747
Natural living systems that have been shown to increase causal emergence after learning, motivating the cross-domain comparison.
agent-environment boundaryconcept0.747
The demarcation that would separate an agent from its environment, which the paper argues is unevidenceable.
MoralBenchmethod0.744
Benchmark for moral understanding in language models; cited as relevant existing evaluation tool
Agent-based modellingmethod0.744
Computational method used to simulate zombie ant behavior.
Artificial agentsconcept0.743
Synthetic agents (here RL-trained neural networks) whose causal emergence was previously largely unknown; the paper addresses this gap.