claim
active
claim:including-within-model-interventions-i-j-in-mas-training-adds-a-soft-constraint-encouraging-separation-of-causal-from-extraneous-subspacesIncluding within-model interventions (i=j) in MAS training adds a soft constraint encouraging separation of causal from extraneous subspaces
Methodological claim about why within-model interchange interventions are essential to the MAS training procedure.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Open question raised in the paper about scaling MAS beyond two models.
- Core interpretive claim supported by the formal analysis showing MAS does not exploit the behavioral null space unlike stitching.
- DAS finds causal effect at all training timesteps including when model is just initialisedfinding0.765Corroborates Wu et al. 2023 finding that DAS expressivity inflates causal effect estimates
- Selective pressure toward convergence via task generality
- Interpretive claim from Case Study II about the distinction between correlational probes and causal interventions
- Central empirical result showing causal coupling between representation and behavior geometry across multiple substrates and modalities.
- Formal analysis showing the theoretical limitation of model stitching as a similarity measure.
- MAS-like methods could potentially be used to directly constrain model internals to be non-toxicclaim0.756Speculative forward-looking claim about practical applications of MAS for model alignment.