Subspace DAS

Extension of DAS that learns a second rotation matrix on top of a fixed first one to decompose representations into sub-representations.

Neighborhood — ranked by edge-count

paper

method

Distributed Alignment Search
extends
The core method introduced in this paper: finds alignments between high-level causal variables and distributed neural representations via gradient descent.

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Emotion Subspaceconcept0.810
The subspace of activation space spanned by the 171 orthogonalized emotion probe vectors, used to measure SAE feature emotional alignment
Subspace Interventionconcept0.798
Intervention targeting specific dimensional subsets of activation vectors rather than full representations
Truth Subspaceconcept0.787
The multi-dimensional activation subspace whose directions causally mediate truthful behavior in LLMs
Two-dimensional truth subspaceframework0.777
Burger et al. (2024) framework proposing that truth is linearly decoded along a 2D subspace capturing both polarity-dependent and polarity-invariant directions.
Balanced Subspacesconcept0.777
Subspaces whose contributions to a layer's output are canceled by opposing weight values, making them non-causally active under natural inputs
Behaviorally Binary Subspaceconcept0.776
A vector subspace that causally impacts outputs only through the sign of its values, enabling harmless magnitude divergence
Subspace Decomposition of Representationsconcept0.771
Investigation of whether a distributed representation can be further decomposed into sub-representations encoding component identities.
Dormant Subspaceconcept0.757
Subspace dimensions that do not vary between inputs; included in the extraneous subspace zextra.