framework
active
framework:model-alignment-search-mas

Model Alignment Search (MAS)

The primary contribution of the paper: a bidirectional causal method that learns rotation matrices for each model to uncover and compare causally relevant latent subspaces across neural networks.

Neighborhood — ranked by edge-count

Papers (1)

paper

Methods (3)

method
  • Fundamental operation for causal abstraction analysis; forces neurons to take values from source inputs to create counterfactuals.
  • Technique to measure representational compatibility by integrating intermediate representations of one model into another
  • Learnable invertible transformation in DAS/MAS that rotates latent vectors into aligned subspaces; narrowed to orthogonal matrices Q.

Concepts (5)

concept
  • Behavioral Null Space
    associated_with
    The span of vector directions that do not change network behavior; a key concept distinguishing MAS from model stitching.
  • Evaluation metric measuring how well a trained intervention matches desired counterfactual model behavior
  • Contiguous subspace of the aligned latent vector encoding behaviorally relevant information for a specific causal variable.
  • Similarity measured with respect to network behavior/function rather than statistical correlation of activations.
  • The desired property of a bidirectional, behavior-preserving mapping between model representations; the goal MAS pursues.

Questions (3)

question

Frameworks (2)

framework

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.