community
active
leiden_hybrid_concepts
label: haiku
community:leiden_hybrid_concepts-run4-c13-c4Multimodal chain-of-thought reasoning benchmarks
ScienceQA and related vision-language tasks evaluated via explicit reasoning steps, spanning 738M-parameter models with 89-95% accuracy ranges.
4 members. Each node is clickable.
Loading graph…
Drawn from 2 sources
The papers/notes whose extracted claims & findings make up this cluster.
Bridges (3)
Other communities that share members with this one — cross-cutting threads or papers that sit at the seam between two themes.
Findings (4)
- B10 final accuracy 94.8 ± 1.2%Accuracy at k=16 shots for B10.
- B8 final accuracy 92.4 ± 1.8%Accuracy at k=16 shots for B8.
- B9 final accuracy 89.7 ± 2.1%Accuracy at k=16 shots for B9.
- 90.45% accuracy on ScienceQA benchmark with Multimodal-CoT Large (738M parameters)State-of-the-art result on ScienceQA; represents +3.91% improvement over prior best published result of 86.54%.