hypothesis

active

hypothesis:using-more-than-two-models-in-a-mas-comparison-could-harm-alignment-due-to-conflicting-loss-gradients-or-could-assist-in-isolating-causal-subspaces

Using more than two models in a MAS comparison could harm alignment due to conflicting loss gradients, or could assist in isolating causal subspaces

Open question raised in the paper about scaling MAS beyond two models.

Source paper

extracted_from

Model Alignment Search

(2025) · Satchel Grant

Neighborhood — ranked by edge-count

Papers (1)

paper

Model Alignment Search
associated_with

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

MAS reduces number of required alignment matrices for n-model comparison from n(n-1) or n^2 (stitching) to nfinding0.845
Key computational efficiency advantage of MAS over traditional model stitching for multi-model comparisons.
MAS is a more causally focused choice than model stitching for addressing questions of how behaviorally relevant information is encoded in different neural systemsclaim0.785
Core interpretive claim supported by the formal analysis showing MAS does not exploit the behavioral null space unlike stitching.
MAS successfully aligns behavior between Multi-Object GRU models in both embedding and hidden state layers with high IIAfinding0.779
Demonstrates MAS's ability to bidirectionally transfer behavior where RSA shows low embedding correlation.
Including within-model interventions (i=j) in MAS training adds a soft constraint encouraging separation of causal from extraneous subspacesclaim0.778
Methodological claim about why within-model interchange interventions are essential to the MAS training procedure.
Larger models should amplify bias less than smaller models, with model biases more accurately reflecting data biases rather than exacerbating themclaim0.775
Implication of PRH for AI fairness and bias
The better an LLM is at language modeling, the more it aligns with vision models, and vice versa — linear relationship between language modeling score and vision-language alignmentfinding0.768
Core cross-modal empirical result: larger and better language models align better with vision models
Roughness in responses decreases with parameter count within same-alignment model families, operationalizing the cost of polishing.claim0.767
MAS-like methods could potentially be used to directly constrain model internals to be non-toxicclaim0.767
Speculative forward-looking claim about practical applications of MAS for model alignment.