method
active
method:subspace-das

Subspace DAS

Extension of DAS that learns a second rotation matrix on top of a fixed first one to decompose representations into sub-representations.

Neighborhood — ranked by edge-count

Methods (1)

method
  • The core method introduced in this paper: finds alignments between high-level causal variables and distributed neural representations via gradient descent.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Emotion Subspaceconcept0.810
    The subspace of activation space spanned by the 171 orthogonalized emotion probe vectors, used to measure SAE feature emotional alignment
  • Intervention targeting specific dimensional subsets of activation vectors rather than full representations
  • Truth Subspaceconcept0.787
    The multi-dimensional activation subspace whose directions causally mediate truthful behavior in LLMs
  • Burger et al. (2024) framework proposing that truth is linearly decoded along a 2D subspace capturing both polarity-dependent and polarity-invariant directions.
  • Balanced Subspacesconcept0.777
    Subspaces whose contributions to a layer's output are canceled by opposing weight values, making them non-causally active under natural inputs
  • A vector subspace that causally impacts outputs only through the sign of its values, enabling harmless magnitude divergence
  • Investigation of whether a distributed representation can be further decomposed into sub-representations encoding component identities.
  • Dormant Subspaceconcept0.757
    Subspace dimensions that do not vary between inputs; included in the extraneous subspace zextra.