concept
active
concept:behavioral-null-space

Behavioral Null Space

The span of vector directions that do not change network behavior; a key concept distinguishing MAS from model stitching.

Neighborhood — ranked by edge-count

Frameworks (1)

framework
  • The primary contribution of the paper: a bidirectional causal method that learns rotation matrices for each model to uncover and compare causally relevant latent subspaces across neural networks.

Methods (1)

method

Concepts (4)

concept
  • Behavior Space
    related_to
    A geometric space of all output token probability distributions, equipped with Hellinger distance, used to visualize model behavior.
  • Harmless Divergence
    associated_with
    Divergences that occur in the behavioral null-space and do not affect functional claims about the model
  • A vector subspace that causally impacts outputs only through the sign of its values, enabling harmless magnitude divergence
  • Dormant Subspace
    associated_with
    Subspace dimensions that do not vary between inputs; included in the extraneous subspace zextra.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • The traditional space of movement in the physical world where animals exhibit problem-solving behavior.
  • A space of physiological parameters that systems navigate for adaptive responses.
  • Positive Spaceconcept0.748
    The property that every bit of space swells outward, is substantial in itself, and is never the leftover from an adjacent shape; every single part of space has positive shape as a center with no amorphous meaningless leftovers
  • Activation spaceconcept0.739
    Representation space on which linear probes operate to attribute harmful behaviors to training data.
  • Ji-An et al.'s characterization of the limited regime in which model self-report succeeds, consistent with this paper's findings
  • The ensemble of all possible configurations of a building, including incomplete states and paths between them.
  • Grouping similar model behaviors; the unsupervised method surfaces clusters of concerning patterns.