finding
active
finding:negative-correlation-between-self-evaluated-emotion-persistence-and-sae-feature-activation-variance-explained-rho-0-184-p-4-6e-09Negative correlation between self-evaluated emotion persistence and SAE feature activation variance explained: rho=-0.184, p=4.6e-09
Shows self-evaluated emotionality is negatively confounded by variance, requiring variance control to reveal the true signal
Source paper
extracted_fromScott Sauers · Imago · Janus · Antra Tessera
Neighborhood — ranked by edge-count
Methods (1)
method
- Controls for variance by sampling random directions from top-k PC spaces matching each emotion probe's explained variance, and subtracting median persistence of 20 matched directions
Findings (1)
finding
- Explains why variance correction is needed to see the self-evaluation–persistence relationship
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Agentic self-evaluation emotionality correlates with SAE feature persistence: rho=+0.124, p=0.0001finding0.861Shows that features Kimi rates as more emotional via self-steering are more persistent, independent of probe construction
- Shows that model self-report of emotion predicts long-range feature persistence
- Correlation between self-evaluation and textual evaluation of SAE feature emotionality: rho=+0.051 (n.s.)finding0.849Shows that the two evaluation methods for emotionality are largely uncorrelated, indicating they capture different signals
- Demonstrates partial but reliable validity of self-evaluation for measuring probe emotionality
- Text-based and self-steered emotionality ratings for SAE features are correlated at only ρ = +0.051 (n.s.).finding0.834Shows low agreement between the two evaluation modalities
- Strong positive relationship between emotion alignment and SAE feature persistence in Cogito
- Surprising finding that the two evaluation methods diverge in their relationship with persistence
- Emotion probe persistence correlation of 0.214 in Cogito v2.1 vs 0.099 for random vectorsfinding0.809Quantifies emotion feature persistence above random baseline in Cogito across 240 multi-turn conversations
Restated by (1)
cosine ≥ 0.90Other entities that say roughly the same thing. May be merge candidates or independent restatements across papers.