claim

active

claim:zero-shot-model-stitching-without-a-learned-stitching-layer-is-feasible-because-different-text-models-embed-data-in-remarkably-similar-ways

Zero-shot model stitching without a learned stitching layer is feasible because different text models embed data in remarkably similar ways

Strong evidence for representational alignment across models

Source paper

extracted_from

The Platonic Representation Hypothesis

(2024) · Minyoung Huh · Brian Cheung · Tongzhou Wang · Phillip Isola

Neighborhood — ranked by edge-count

Concepts (1)

concept

Zero-Shot Model Stitching
associated_with
Model stitching without learning a stitching layer, demonstrating strong alignment across different model training regimes

Claims (1)

claim

There is a growing similarity in how datapoints are represented in different neural network models, spanning different architectures, training objectives, and data modalities
supports
Primary empirical claim of the paper

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Zero-shot model stitching without learning a stitching layer is feasible across different text models trained on different modalitiesfinding0.955
Moschella et al. result cited as evidence of representational convergence across models
Model stitching can use the behavioral null space of the source model when mapping to the target, making successful stitching insufficient evidence of representational similarityclaim0.812
Formal analysis showing the theoretical limitation of model stitching as a similarity measure.
Language models are few-shot learners (Brown et al., 2020)concept0.774
Demonstrated transformers on mathematical understanding and logic; cited to motivate transformer versatility.
In some sense, this is the simplest language model we profoundly don't understand. And so it makes a natural target for our paper.quote0.763
Articulates why a one-layer transformer with MLP is the appropriate starting target for mechanistic interpretability
Simple, intentionally rough paper and cardboard models that can be rapidly torn, cut, and patched provide a practical way to evolve design through feedback.claim0.762
Proposed practical method for achieving step-by-step feedback in design.
MAS is a more causally focused choice than model stitching for addressing questions of how behaviorally relevant information is encoded in different neural systemsclaim0.757
Core interpretive claim supported by the formal analysis showing MAS does not exploit the behavioral null space unlike stitching.
Model Stitchingmethod0.748
Technique to measure representational compatibility by integrating intermediate representations of one model into another
Models might produce first-person experiential language by drawing on human-authored self-descriptions in pretraining data without internally encoding these acts as roleplayhypothesis0.743
Alternative hypothesis for how experience reports arise without explicit performance