Emotion Subspace

The subspace of activation space spanned by the 171 orthogonalized emotion probe vectors, used to measure SAE feature emotional alignment

Neighborhood — ranked by edge-count

method

SAE Feature Emotion Subspace Overlap Metric
about
Fraction of an SAE feature's length lying inside the 171-dimensional subspace spanned by emotion probes, computed via SVD orthogonalization

concept

Emotion Features in LLMs
associated_with
Internal representations encoding emotion concepts in large language models, identified by probing and SAE methods

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Emotion subspace overlap (SVD-based)method0.818
Metric measuring how much of an SAE feature vector lies within the 171-dimensional subspace spanned by emotion probes, via SVD orthogonalization
Subspace DASmethod0.810
Extension of DAS that learns a second rotation matrix on top of a fixed first one to decompose representations into sub-representations.
Subspace Interventionconcept0.791
Intervention targeting specific dimensional subsets of activation vectors rather than full representations
Truth Subspaceconcept0.789
The multi-dimensional activation subspace whose directions causally mediate truthful behavior in LLMs
Behaviorally Binary Subspaceconcept0.786
A vector subspace that causally impacts outputs only through the sign of its values, enabling harmless magnitude divergence
Balanced Subspacesconcept0.753
Subspaces whose contributions to a layer's output are canceled by opposing weight values, making them non-causally active under natural inputs
Two-dimensional truth subspaceframework0.752
Burger et al. (2024) framework proposing that truth is linearly decoded along a 2D subspace capturing both polarity-dependent and polarity-invariant directions.
PCs of the emotion space and persistenceconcept0.750
Analysis showing that lower-rank (more central) PCs of emotion feature activations are more persistent than higher-rank (noisier) PCs