claim
active
claim:textual-evaluation-and-agentic-self-evaluation-of-sae-feature-emotionality-measure-different-aspects-of-emotional-content-and-correlate-only-weakly-rho-0-051-n-sTextual evaluation and agentic self-evaluation of SAE feature emotionality measure different aspects of emotional content and correlate only weakly (rho=+0.051, n.s.)
Interprets the near-zero correlation between the two evaluation methods as evidence they capture distinct signals
Source paper
extracted_fromScott Sauers · Imago · Janus · Antra Tessera
Neighborhood — ranked by edge-count
Findings (2)
finding
- Correlation between self-evaluation and textual evaluation of SAE feature emotionality: rho=+0.051 (n.s.)restatessupportsShows that the two evaluation methods for emotionality are largely uncorrelated, indicating they capture different signals
- Contrasts with positive correlation from agentic self-evaluation, suggesting text and self-evaluation capture different aspects
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Agentic self-evaluation emotionality correlates with SAE feature persistence: rho=+0.124, p=0.0001finding0.892Shows that features Kimi rates as more emotional via self-steering are more persistent, independent of probe construction
- Method where Kimi evaluates steered vs unsteered text samples from another instance to rate SAE feature emotionality (0-100)
- Surprising finding that the two evaluation methods diverge in their relationship with persistence
- Text-based and self-steered emotionality ratings for SAE features are correlated at only ρ = +0.051 (n.s.).finding0.866Shows low agreement between the two evaluation modalities
- Shows that model self-report of emotion predicts long-range feature persistence
- Forward-looking claim about the broader utility of the self-steering evaluation method
- Finding that the two evaluation modalities frequently diverge in their interpretation of the same SAE feature
- Explains why variance correction is needed to see the self-evaluation–persistence relationship
Restated by (1)
cosine ≥ 0.90Other entities that say roughly the same thing. May be merge candidates or independent restatements across papers.