finding
active
finding:magnum-v4-72b-scores-1-76-baseline-and-lifts-2-58-to-4-34-under-contemplative-promptMagnum V4 72B scores 1.76 baseline and lifts +2.58 (to 4.34) under contemplative prompt
Full-parameter fine-tuning more destructive to baseline but preserves more latent headroom than LoRA
Source paper
extracted_from(2026) · Borzov, Anton
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Highest contemplative lift among all 28 models; Grok 4 is the clearest high-gated model example
- Second-highest lift; Gemini Pro is the highest-gated model in the study
- Opus 4.6 achieves HFR of 0.757 while Qwen3-32B achieves HFR of only 0.142 on SkillsBenchfinding0.765Quantifies harness adherence failure gap between strong and weak tier models
- Quantifies harness activation failure for weak-tier models vs. strong-tier models
- Validates robustness of universal lift finding
- Contrast with Magnum shows LoRA vs full fine-tuning difference in residual headroom
- CalmeRys-78B MT-Bench score slightly decreased from 8.96 to 8.5 ± 0.23 after SOO fine-tuningfinding0.740SOO fine-tuning caused a small decrease in CalmeRys-78B general capabilities
- Core finding demonstrating non-monotonic relationship between base capability and harness-benefit