MT-Bench

Benchmark used to measure general task performance of LLMs before and after SOO fine-tuning

Neighborhood — ranked by edge-count

paper

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

BIG-benchframework0.780
Large-scale collaborative benchmark for LLM capabilities, cited.
MTAdammethod0.745
Automatic balancing of multiple training loss terms.
EQ-Benchmethod0.701
Emotional intelligence benchmark (171 problems) used to check if activation capping degrades soft skills
Aligned-MTLmethod0.697
Independent component alignment for multi-task learning.
Medium-term memoryconcept0.688
Vascular clamp's function: holding specific predictions stable over timescales longer than working memory.
Meta CICEROconcept0.686
AI system that mastered Diplomacy using deception despite being designed for cooperation; cited as example of AI deception
Meta-learningconcept0.685
The capability of GPT-3 to learn tasks from few-shot prompts during runtime.
Massachusetts Institute of Technologyinstitute0.682
Host institution for the Architecture Machine Group and Alexander's early design research.