finding
active
finding:structured-output-failure-rate-below-1-for-all-evaluated-modelsStructured-output failure rate below 1% for all evaluated models
JSON parsing errors do not explain performance gaps.
Source paper
extracted_from(2026) · Robert Müller · Clemens Müller
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- key claim about the benchmark's unique diagnostic value
- LLMs reliably produce valid JSON actions.
- Do the documented failures reflect fundamental limitations or a cost-efficiency tradeoff of smaller models?question0.706question for future work on frontier models
- A failure mode exposed by the SAE framework where model representations are entangled or collapse under intervention
- design claim about transparency
- Diagnosis of first failure mode explaining low harness-benefit for weak-tier models