concept
active
concept:behavioral-null-spaceBehavioral Null Space
The span of vector directions that do not change network behavior; a key concept distinguishing MAS from model stitching.
Neighborhood — ranked by edge-count
Papers (1)
paper
Frameworks (1)
framework
- Model Alignment Search (MAS)associated_withThe primary contribution of the paper: a bidirectional causal method that learns rotation matrices for each model to uncover and compare causally relevant latent subspaces across neural networks.
Methods (1)
method
- Algorithm 1: Harmlessness ClassificationimplementsProposed algorithm using local PCA to classify a divergence vector as harmless or harmful via behavioral null-space testing
Concepts (4)
concept
- Behavior Spacerelated_toA geometric space of all output token probability distributions, equipped with Hellinger distance, used to visualize model behavior.
- Harmless Divergenceassociated_withDivergences that occur in the behavioral null-space and do not affect functional claims about the model
- Behaviorally Binary Subspaceassociated_withA vector subspace that causally impacts outputs only through the sign of its values, enabling harmless magnitude divergence
- Dormant Subspaceassociated_withSubspace dimensions that do not vary between inputs; included in the extraneous subspace zextra.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The traditional space of movement in the physical world where animals exhibit problem-solving behavior.
- A space of physiological parameters that systems navigate for adaptive responses.
- The property that every bit of space swells outward, is substantial in itself, and is never the leftover from an adjacent shape; every single part of space has positive shape as a center with no amorphous meaningless leftovers
- Representation space on which linear probes operate to attribute harmful behaviors to training data.
- Ji-An et al.'s characterization of the limited regime in which model self-report succeeds, consistent with this paper's findings
- The ensemble of all possible configurations of a building, including incomplete states and paths between them.
- Grouping similar model behaviors; the unsupervised method surfaces clusters of concerning patterns.