finding
active
finding:zero-shot-model-stitching-without-learning-a-stitching-layer-is-feasible-across-different-text-models-trained-on-different-modalitiesZero-shot model stitching without learning a stitching layer is feasible across different text models trained on different modalities
Moschella et al. result cited as evidence of representational convergence across models
Source paper
extracted_from(2024) · Minyoung Huh · Brian Cheung · Tongzhou Wang · Phillip Isola
Neighborhood — ranked by edge-count
Hypotheses (1)
hypothesis
- The central hypothesis of the paper; the platonic representation hypothesis itself
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Strong evidence for representational alignment across models
- Model stitching without learning a stitching layer, demonstrating strong alignment across different model training regimes
- Formal analysis showing the theoretical limitation of model stitching as a similarity measure.
- Technique to measure representational compatibility by integrating intermediate representations of one model into another
- Demonstrated transformers on mathematical understanding and logic; cited to motivate transformer versatility.
- Core interpretive claim supported by the formal analysis showing MAS does not exploit the behavioral null space unlike stitching.
- Proposed practical method for achieving step-by-step feedback in design.
- Evidence that model stitching can exploit the behavioral null space, making it less causally restrictive than MAS.