method
active
method:emotion-subspace-overlap-svd-basedEmotion subspace overlap (SVD-based)
Metric measuring how much of an SAE feature vector lies within the 171-dimensional subspace spanned by emotion probes, via SVD orthogonalization
Neighborhood — ranked by edge-count
Papers (1)
paper
Methods (1)
method
- Linear probes constructed to measure 171 emotion concepts in model activations with surface semantic content removed
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Fraction of an SAE feature's length lying inside the 171-dimensional subspace spanned by emotion probes, computed via SVD orthogonalization
- The subspace of activation space spanned by the 171 orthogonalized emotion probe vectors, used to measure SAE feature emotional alignment
- Orthogonalizes the 171 emotion probes via SVD to create an orthonormal basis for computing SAE feature subspace overlap
- Strong positive relationship between emotion alignment and SAE feature persistence in Cogito
- Claim supporting the validity of the probe construction method via cross-validation with self-report
- SAE feature emotion subspace overlap correlates with persistence in Cogito: Spearman +0.413, p=4.4e-196finding0.762Demonstrates that SAE features more aligned with the emotion subspace are more persistent in Cogito after variance control
- The extent to which a model exhibits similar internal representations when reasoning about itself and others in similar contexts
- Extension of DAS that learns a second rotation matrix on top of a fixed first one to decompose representations into sub-representations.