Vision Transformer (ViT)

zhang-2023-multimodal.md

Frontmatter (10 fields)

{
  "doc": "zhang-2023-multimodal.md",
  "context": "Vision feature extraction model used to extract patch-level features from images in Multimodal-CoT.",
  "category": "ai",
  "norm_label": "Vision Transformer (ViT)",
  "graphify_id": "vision_transformer",
  "source_file": "zhang-2023-multimodal.md",
  "imported_from": "/Users/antonborzov/Documents/Research.nosync/papers/extract_typed_out/zhang-2023-multimodal/graph.json",
  "extracted_type": "method",
  "source_location": "L40",
  "graphify_file_type": "method"
}

Outgoing (0)

None.

Incoming (1)

Implemented by (1)

Multimodal-CoT(framework)

Mentions (1)

papers-typed
zhang-2023-multimodal.md