method
active
method:mt-benchMT-Bench
Benchmark used to measure general task performance of LLMs before and after SOO fine-tuning
Neighborhood — ranked by edge-count
Papers (1)
paper
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Large-scale collaborative benchmark for LLM capabilities, cited.
- Automatic balancing of multiple training loss terms.
- Emotional intelligence benchmark (171 problems) used to check if activation capping degrades soft skills
- Independent component alignment for multi-task learning.
- Vascular clamp's function: holding specific predictions stable over timescales longer than working memory.
- AI system that mastered Diplomacy using deception despite being designed for cooperation; cited as example of AI deception
- The capability of GPT-3 to learn tasks from few-shot prompts during runtime.
- Host institution for the Architecture Machine Group and Alexander's early design research.