community
active
leiden_hybrid_concepts
label: haiku
community:leiden_hybrid_concepts-run4-c13-c2Chain-of-thought reasoning across modalities
Demonstrates CoT effectiveness in multimodal contexts (vision+language) and few-shot settings, with ScienceQA as primary benchmark, circa 2023.
5 members. Each node is clickable.
Loading graph…
Drawn from 3 sources
The papers/notes whose extracted claims & findings make up this cluster.
- Multimodal Chain-of-Thought Reasoning in Language Models3 members
- Probe-Based Data Attribution: Surfacing and Mitigating Undesirable Behaviors in LLM Post-Training1 member
- CAT'S THEORY: Empirical Validation and Architectural Applications Cross-Architecture AI Consciousness Recognition and the Foundation for Constraint-Preserving Recursive Intelligence1 member
Bridges (2)
Other communities that share members with this one — cross-cutting threads or papers that sit at the seam between two themes.
Findings (3)
- Multimodal-CoT trained with InstructBLIP/ChatGPT-generated rationales achieves 87.76% accuracy on ScienceQA, comparable to human-annotated rationale performance of 90.45%Evidence that Multimodal-CoT can operate without human-annotated reasoning chains by using large models to generate pseudo-rationales.
- Multimodal-CoT with vision features achieves higher validation accuracy at early training epochs (epoch 1-3) compared to one-stage and two-stage language-only baselines on ScienceQAEvidence that multimodal information accelerates convergence speed during training.
- Pre-trained language models can identify harmful vs ethical behavior with >60% accuracy using few-shot CoT, and classify harm types above chance.Figure 12 left and right show accuracy on harmful/ethical identification and 9-way classification.
Claims (2)
- The work is methodologically rigorous applied researchMeta-assessment from the paper's notes, emphasizing the engineering rigor.
- This is the first work to study CoT reasoning in different modalities in scientific peer-reviewed literatureAuthors' assertion of novelty and priority; appears in contributions and Table 1.