concept
active
concept:subspace-decomposition-of-representationsSubspace Decomposition of Representations
Investigation of whether a distributed representation can be further decomposed into sub-representations encoding component identities.
Neighborhood — ranked by edge-count
Methods (1)
method
- Distributed Alignment SearchimplementsThe core method introduced in this paper: finds alignments between high-level causal variables and distributed neural representations via gradient descent.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Mathematical structure central to distributed interchange interventions; representation space decomposed into orthogonal subspaces each aligned with a high-level variable.
- The central question of whether representational geometry implies corresponding computational structure
- Extension of DAS that learns a second rotation matrix on top of a fixed first one to decompose representations into sub-representations.
- Substrate on which causal emergence was computed across agent lifetimes; aligned with learning success.
- Subspaces whose contributions to a layer's output are canceled by opposing weight values, making them non-causally active under natural inputs
- A vector subspace that causally impacts outputs only through the sign of its values, enabling harmless magnitude divergence
- The idea that features are encoded as directions in activation space.
- The multi-dimensional activation subspace whose directions causally mediate truthful behavior in LLMs