concept

active

concept:zheng-et-al-2023-judging-llm-as-a-judge-with-mt-bench-and-chatbot-arena

Zheng et al. 2023 - Judging LLM-as-a-judge with MT-Bench and Chatbot Arena

Source paper for the MT-Bench evaluation benchmark used to assess capabilities post-SOO fine-tuning

Neighborhood — ranked by edge-count

paper

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

The TrueSkill ranking broadly aligns with Chatbot Arena but diverges from reasoning-mode-aggregating evaluations.claim0.773
Comparison to external leaderboards showing misalignment.
LLM judge (deepseek-v3) agrees with human evaluator on 91.6% of 200 sampled jailbreak responsesfinding0.749
Validates the LLM-based harm evaluation rubric
LLM-Judge Data Attributionmethod0.740
Alternative data attribution approach using an LLM as a judge; compared against the probe-based method.
LLM Judge Binary Classifiermethod0.736
An LLM-based classifier that returns 1 if response contains a clear subjective experience report and 0 otherwise
LLM introspection on internal computations is architecturally permitted; whether models leverage this is an empirical question.claim0.730
Core claim directly challenged by prior work denying introspection; forms foundation for Koan Battery introspection studies.
LLMs linearly represent truth-relevant information beyond the plausibility of text, as evidenced by probes trained on likely performing poorly on anti-correlated datasetsclaim0.730
Establishes that the observed linear structure is not merely a representation of text probability
LLM alignment score to DINOv2 shows an emergence-esque trend with GSM8K mathematical reasoning performancefinding0.729
Alignment predicts math performance with emergent pattern
LLM judge evaluationmethod0.723
Using Claude Sonnet 4 as a grader to categorize model responses according to predefined criteria.