finding

active

finding:zero-shot-model-stitching-without-learning-a-stitching-layer-is-feasible-across-different-text-models-trained-on-different-modalities

Zero-shot model stitching without learning a stitching layer is feasible across different text models trained on different modalities

Moschella et al. result cited as evidence of representational convergence across models

Source paper

extracted_from

The Platonic Representation Hypothesis

(2024) · Minyoung Huh · Brian Cheung · Tongzhou Wang · Phillip Isola

Neighborhood — ranked by edge-count

Hypotheses (1)

hypothesis

Different neural network models trained on different objectives and modalities are converging to a shared statistical model of reality in their representation spaces
supports
The central hypothesis of the paper; the platonic representation hypothesis itself

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Zero-shot model stitching without a learned stitching layer is feasible because different text models embed data in remarkably similar waysclaim0.955
Strong evidence for representational alignment across models
Zero-Shot Model Stitchingconcept0.874
Model stitching without learning a stitching layer, demonstrating strong alignment across different model training regimes
Model stitching can use the behavioral null space of the source model when mapping to the target, making successful stitching insufficient evidence of representational similarityclaim0.808
Formal analysis showing the theoretical limitation of model stitching as a similarity measure.
Model Stitchingmethod0.761
Technique to measure representational compatibility by integrating intermediate representations of one model into another
Language models are few-shot learners (Brown et al., 2020)concept0.755
Demonstrated transformers on mathematical understanding and logic; cited to motivate transformer versatility.
MAS is a more causally focused choice than model stitching for addressing questions of how behaviorally relevant information is encoded in different neural systemsclaim0.754
Core interpretive claim supported by the formal analysis showing MAS does not exploit the behavioral null space unlike stitching.
Simple, intentionally rough paper and cardboard models that can be rapidly torn, cut, and patched provide a practical way to evolve design through feedback.claim0.740
Proposed practical method for achieving step-by-step feedback in design.
Model stitching achieves nearly perfect IIA even for rank-2 transformation matrices on Multi-Object GRU modelsfinding0.736
Evidence that model stitching can exploit the behavioral null space, making it less causally restrictive than MAS.