framework
active
framework:agentbenchAgentBench
Benchmark evaluating LLMs as interactive agents in tool-use settings, cited.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Any autonomous system including living and non-living forms that embodies a perception-action cycle and tries to navigate and persist in an environment
- The varied neural network architectures used in the RL experiments to test whether the alignment phenomenon generalizes across architectures.
- Social deduction game benchmark (Avalon) for LLMs, cited.
- Natural living systems that have been shown to increase causal emergence after learning, motivating the cross-domain comparison.
- The demarcation that would separate an agent from its environment, which the paper argues is unevidenceable.
- Benchmark for moral understanding in language models; cited as relevant existing evaluation tool
- Computational method used to simulate zombie ant behavior.
- Synthetic agents (here RL-trained neural networks) whose causal emergence was previously largely unknown; the paper addresses this gap.