method
active
method:svd-orthogonalization-of-emotion-probesSVD Orthogonalization of Emotion Probes
Orthogonalizes the 171 emotion probes via SVD to create an orthonormal basis for computing SAE feature subspace overlap
Neighborhood — ranked by edge-count
Methods (1)
method
- Fraction of an SAE feature's length lying inside the 171-dimensional subspace spanned by emotion probes, computed via SVD orthogonalization
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Metric measuring how much of an SAE feature vector lies within the 171-dimensional subspace spanned by emotion probes, via SVD orthogonalization
- Linear probes constructed to measure 171 emotion concepts in model activations with surface semantic content removed
- Claim supporting the validity of the probe construction method via cross-validation with self-report
- Core empirical claim distinguishing emotion persistence from generic high-variance probe persistence
- Caveat on probe interpretation; does not negate the introspection result but affects interpretation of the target variable
- Method for building 171 emotion probes by generating stories, embedding them, regressing out Gemini embeddings, and averaging residual activations per emotion
- Text-based and self-steered emotionality ratings for SAE features are correlated at only ρ = +0.051 (n.s.).finding0.721Shows low agreement between the two evaluation modalities
- Key methodological claim: MM probes are both competitive in accuracy and superior in causal influence