finding

active

finding:claude-haiku-4-5-and-gpt-5-4-nano-have-tc-tightness-0-4-the-tightest-among-all

Claude Haiku 4.5 and GPT-5.4 Nano have TC tightness τ ≈ 0.4, the tightest among all

These two LLMs bargain with minimal overpayment but low overall efficiency.

Source paper

extracted_from

(2026) · Robert Müller · Clemens Müller

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

TrackerAgent and SetRaceAgent have TC tightness τ ≈ 0.2–0.25, looser countersfinding0.760
Code agents trade bargaining precision for acquisition pressure.
GPT5.4-N TrueSkill μ=22.6±2.7finding0.755
GPT-5.4 Nano TrueSkill rating
On SWE-bench, Claude Opus 4.6 and Claude Sonnet 4.6 both achieve 7.4 pp harness-updating gain; Claude Haiku 4.5 achieves 8.0 ppfinding0.744
Full evolver-side SWE results showing comparable performance across Claude family tiers
Claude Haiku 4.5 overbid rate 0.87%finding0.740
Haiku's overbid frequency is second highest after G2.5-FL.
Gemini 3 Flash TC bargaining tightness τ ≈ 0.34finding0.738
G3-F achieves a TC tightness of about 0.34, meaning moderate overpayment in won challenges.
Haiku 4.5 achieves the largest harness-benefit on SkillsBench (15.1 pp) despite mid-tier base capability of 5.8%finding0.728
Shows SB low-base regime is more variable than SWE; Haiku benefits far more than Qwen3-235B despite similar base rates
Claude Haiku 4.5 and EconomyAgent average fewer than 1.7 quartets per gamefinding0.726
Weak agents complete very few quartets, correlating with low scores.
GPT-5.4 Nano self-bidding rate 74.6%finding0.725
GPT5.4-N also exhibits a high self-bidding propensity.