Model Weight Merging

The phenomenon that separately trained models of the same architecture converge to the same basin and can be merged

Neighborhood — ranked by edge-count

claim

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Model Misalignmentconcept0.733
The phenomenon of model internals deviating from desired behavior; MAS is demonstrated to detect this via comparison of toxic vs nontoxic LLMs.
Model Evidenceconcept0.719
Probability of data under the model, penalizing complexity and rewarding accuracy.
modelconcept0.718
A representation that captures relevant aspects of a system; according to the theorem, the regulator must embody this.
Model Deceptionconcept0.716
LLM behavior of generating falsehoods; the multi-dimensional truth subspace raises new risks for subtle manipulation
Equal Weightingframework0.714
Baseline MTL approach minimizing sum of task losses with equal weights; suffers from task balancing
Weight matrix decompositionconcept0.704
The core idea of decomposing weight matrices into components for interpretability.
we have shown a mathematical relationship between the two modelsquote0.703
Core claim distinguishing this paper's contribution from looser representational similarity arguments.
Big Two Modelframework0.702
Meta-trait model grouping OCEAN traits into stability (C, A, reversed N) and plasticity (E, O); used to evaluate covariance patterns from injections