BIG-bench

Large-scale collaborative benchmark for LLM capabilities, cited.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

MT-Benchmethod0.780
Benchmark used to measure general task performance of LLMs before and after SOO fine-tuning
EQ-Benchmethod0.734
Emotional intelligence benchmark (171 problems) used to check if activation capping degrades soft skills
larger wholesconcept0.707
The broader field of centers that encompasses a given center; a successful center contributes to and is shaped by these larger wholes.
monitorsconcept0.707
Synchronization construct encapsulating shared data and protected access routines.
Base-10 additionconcept0.699
The generic addition mechanism that Llama-3.1-8B actually uses to compute sums before mapping back to cyclic concept space
Desktopframework0.698
GUI window management construct supporting MDI-style display of applications, used as a top-level backplane facility.
Googleinstitute0.697
Murray Shanahan's part-time employer and provider of LLM technology.
Boundariesconcept0.696
The property that living centers are formed and strengthened by boundaries which both separate and unite; the boundary must be of the same order of magnitude as the center being bounded and is itself made of centers