method
active
method:emotion-probes-171-emotion-residual-vector-probesEmotion probes (171-emotion residual vector probes)
Linear probes constructed to measure 171 emotion concepts in model activations with surface semantic content removed
Neighborhood — ranked by edge-count
Concepts (1)
concept
- layer 40 residual-stream activationsassociated_withThe specific neural network layer from which activations are extracted for probe construction and SAE training in the target models
Methods (4)
method
- Statistical method used to analyze neural activity data.
- Metric measuring how much of an SAE feature vector lies within the 171-dimensional subspace spanned by emotion probes, via SVD orthogonalization
- Method used to predict model activations from Gemini embeddings and compute residuals for probe construction
- Used to embed story text so that surface-level semantic content can be regressed out from model activations
Datasets (1)
dataset
- Over 1000 short stories per emotion generated by models across 100 topics for probe construction
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Core empirical claim distinguishing emotion persistence from generic high-variance probe persistence
- Shows that causal steering effects persist over long ranges for a substantial fraction of emotion probes
- Emotion probe persistence correlation of 0.214 in Cogito v2.1 vs 0.099 for random vectorsfinding0.769Quantifies emotion feature persistence above random baseline in Cogito across 240 multi-turn conversations
- Orthogonalizes the 171 emotion probes via SVD to create an orthonormal basis for computing SAE feature subspace overlap
- Demonstrates that Cogito emotion probes are persistently active beyond what is explained by their variance alone
- Cited as activation-level support for the performing care vs having care distinction the battery detects behaviorally
- Quantitative measure of emotion feature persistence vs random baseline in Cogito
- Caveat on probe interpretation; does not negate the introspection result but affects interpretation of the target variable