Emotion subspace overlap (SVD-based)

Metric measuring how much of an SAE feature vector lies within the 171-dimensional subspace spanned by emotion probes, via SVD orthogonalization

Neighborhood — ranked by edge-count

Papers (1)

paper

Persistence and Introspection of Emotion Features
introduces

Methods (1)

method

Emotion probes (171-emotion residual vector probes)
uses
Linear probes constructed to measure 171 emotion concepts in model activations with surface semantic content removed

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

SAE Feature Emotion Subspace Overlap Metricmethod0.838
Fraction of an SAE feature's length lying inside the 171-dimensional subspace spanned by emotion probes, computed via SVD orthogonalization
Emotion Subspaceconcept0.818
The subspace of activation space spanned by the 171 orthogonalized emotion probe vectors, used to measure SAE feature emotional alignment
SVD Orthogonalization of Emotion Probesmethod0.782
Orthogonalizes the 171 emotion probes via SVD to create an orthonormal basis for computing SAE feature subspace overlap
SAE emotion subspace overlap correlates with variance-residualized persistence in Cogito: Spearman +0.413, p = 4.4e-196.finding0.776
Strong positive relationship between emotion alignment and SAE feature persistence in Cogito
The correlation between emotion subspace fraction and self-evaluated emotionality validates that emotion probe concepts somewhat overlap with the model's self-reported internal emotions.claim0.772
Claim supporting the validity of the probe construction method via cross-validation with self-report
SAE feature emotion subspace overlap correlates with persistence in Cogito: Spearman +0.413, p=4.4e-196finding0.762
Demonstrates that SAE features more aligned with the emotion subspace are more persistent in Cogito after variance control
Self-Other Overlapconcept0.735
The extent to which a model exhibits similar internal representations when reasoning about itself and others in similar contexts
Subspace DASmethod0.733
Extension of DAS that learns a second rotation matrix on top of a fixed first one to decompose representations into sub-representations.