finding
active
finding:17-of-83-emotions-tested-show-significant-associations-between-sae-feature-self-evaluation-transcripts-mentioning-the-emotion-word-and-higher-cosine-similarity-to-that-emotion-probe-67-of-83-have-positive-associations17 of 83 emotions tested show significant associations between SAE feature self-evaluation transcripts mentioning the emotion word and higher cosine similarity to that emotion probe; 67 of 83 have positive associations.
Demonstrates partial but reliable validity of self-evaluation for measuring probe emotionality
Source paper
extracted_fromScott Sauers · Imago · Janus · Antra Tessera
Neighborhood — ranked by edge-count
Claims (1)
claim
- Claim supporting the validity of the probe construction method via cross-validation with self-report
Methods (1)
method
- One-sided permutation testsupportsStatistical test used to evaluate whether SAE features mentioning an emotion word have higher cosine similarity to that emotion probe
Findings (1)
finding
- Validates that agentic self-evaluation captures genuine emotional content of probes
Questions (1)
question
- Question addressed by testing whether self-evaluation transcripts mentioning emotion words have higher cosine similarity to corresponding probes
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Text-based and self-steered emotionality ratings for SAE features are correlated at only ρ = +0.051 (n.s.).finding0.862Shows low agreement between the two evaluation modalities
- Correlation between self-evaluation and textual evaluation of SAE feature emotionality: rho=+0.051 (n.s.)finding0.854Shows that the two evaluation methods for emotionality are largely uncorrelated, indicating they capture different signals
- Explains why variance correction is needed to see the self-evaluation–persistence relationship
- Shows self-evaluated emotionality is negatively confounded by variance, requiring variance control to reveal the true signal
- Surprising finding that the two evaluation methods diverge in their relationship with persistence
- Interprets the near-zero correlation between the two evaluation methods as evidence they capture distinct signals
- Novel finding that agentic self-evaluation of emotionality correlates with feature persistence
- Highest-rated emotional SAE feature; self-report describes overwhelming positive emotional valence
Restated by (1)
cosine ≥ 0.90Other entities that say roughly the same thing. May be merge candidates or independent restatements across papers.