Subspace Decomposition of Representations

Investigation of whether a distributed representation can be further decomposed into sub-representations encoding component identities.

Neighborhood — ranked by edge-count

method

Distributed Alignment Search
implements
The core method introduced in this paper: finds alignments between high-level causal variables and distributed neural representations via gradient descent.

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Orthogonal Decomposition of Representation Spaceconcept0.789
Mathematical structure central to distributed interchange interventions; representation space decomposed into orthogonal subspaces each aligned with a high-level variable.
Structure in representationsconcept0.783
The central question of whether representational geometry implies corresponding computational structure
Subspace DASmethod0.771
Extension of DAS that learns a second rotation matrix on top of a fixed first one to decompose representations into sub-representations.
Latent-Space Representationsconcept0.755
Substrate on which causal emergence was computed across agent lifetimes; aligned with learning success.
Balanced Subspacesconcept0.753
Subspaces whose contributions to a layer's output are canceled by opposing weight values, making them non-causally active under natural inputs
Behaviorally Binary Subspaceconcept0.752
A vector subspace that causally impacts outputs only through the sign of its values, enabling harmless magnitude divergence
Linear representationconcept0.747
The idea that features are encoded as directions in activation space.
Truth Subspaceconcept0.747
The multi-dimensional activation subspace whose directions causally mediate truthful behavior in LLMs