claim
active
claim:mas-is-a-more-causally-focused-choice-than-model-stitching-for-addressing-questions-of-how-behaviorally-relevant-information-is-encoded-in-different-neural-systemsMAS is a more causally focused choice than model stitching for addressing questions of how behaviorally relevant information is encoded in different neural systems
Core interpretive claim supported by the formal analysis showing MAS does not exploit the behavioral null space unlike stitching.
Neighborhood — ranked by edge-count
Findings (2)
finding
- GRU behavior can be compressed to as few as 4 dimensions using DAS and MAS with comparable IIAssupportsShows that behaviorally relevant information is low-dimensional; contrasted with model stitching achieving near-perfect IIA at rank 2.
- Key computational efficiency advantage of MAS over traditional model stitching for multi-model comparisons.
Claims (1)
claim
- Formal analysis showing the theoretical limitation of model stitching as a similarity measure.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- MAS-like methods could potentially be used to directly constrain model internals to be non-toxicclaim0.787Speculative forward-looking claim about practical applications of MAS for model alignment.
- Open question raised in the paper about scaling MAS beyond two models.
- Methodological claim about why within-model interchange interventions are essential to the MAS training procedure.
- Demonstrates MAS's ability to bidirectionally transfer behavior where RSA shows low embedding correlation.
- The primary contribution of the paper: a bidirectional causal method that learns rotation matrices for each model to uncover and compare causally relevant latent subspaces across neural networks.
- Strong evidence for representational alignment across models
- Moschella et al. result cited as evidence of representational convergence across models
- Describes scaffolding method and the model's meta-learning loop.