Multimodal-CoT

zhang-2023-multimodal.md

Frontmatter (10 fields)

{
  "doc": "zhang-2023-multimodal.md",
  "context": "A two-stage framework that separates rationale generation and answer inference by incorporating vision and language modalities.",
  "category": "ai",
  "norm_label": "Multimodal-CoT",
  "graphify_id": "multimodal_cot_framework",
  "source_file": "zhang-2023-multimodal.md",
  "imported_from": "/Users/antonborzov/Documents/Research.nosync/papers/extract_typed_out/zhang-2023-multimodal/graph.json",
  "extracted_type": "framework",
  "source_location": "§4",
  "graphify_file_type": "framework"
}

Outgoing (7)

about (2)

A-OKVQA(dataset)
ScienceQA(dataset)

Extends (1)

Chain-of-Thought (CoT)(framework)

Implements (4)

gated fusion(method)
T5(framework)
two-stage separation of rationale generation and answer inference(framework)
Vision Transformer (ViT)(method)

Incoming (6)

about (1)

mm-cot(artifact)

introduces (1)

Zhuosheng Zhang(thinker)

Supported by (4)

Mentions (1)

papers-typed
zhang-2023-multimodal.md