community

active

leiden_hybrid_concepts

label: haiku

community:leiden_hybrid_concepts-run4-c13-c4

Multimodal chain-of-thought reasoning benchmarks

ScienceQA and related vision-language tasks evaluated via explicit reasoning steps, spanning 738M-parameter models with 89-95% accuracy ranges.

4 members. Each node is clickable.

Loading graph…

Drawn from 2 sources

The papers/notes whose extracted claims & findings make up this cluster.

The Guanyin Protocol: A Framework for Immediately Establishing an Understanding of Both Causality and Compassion in LLM Systems Using Semantic Anchoring3 members
Multimodal Chain-of-Thought Reasoning in Language Models1 member

Bridges (3)

Other communities that share members with this one — cross-cutting threads or papers that sit at the seam between two themes.

Chain-of-Thought reasoning robustness & safety4 shared
Benchmark classification accuracy results3 shared
Multimodal Chain-of-Thought Reasoning1 shared

Findings (4)

B10 final accuracy 94.8 ± 1.2%Accuracy at k=16 shots for B10.
B8 final accuracy 92.4 ± 1.8%Accuracy at k=16 shots for B8.
B9 final accuracy 89.7 ± 2.1%Accuracy at k=16 shots for B9.
90.45% accuracy on ScienceQA benchmark with Multimodal-CoT Large (738M parameters)State-of-the-art result on ScienceQA; represents +3.91% improvement over prior best published result of 86.54%.