claim
active
claim:text-based-and-self-steered-emotionality-ratings-are-only-weakly-correlated-0-051-n-s-suggesting-they-measure-different-aspects-of-feature-emotionalityText-based and self-steered emotionality ratings are only weakly correlated (ρ = +0.051, n.s.), suggesting they measure different aspects of feature emotionality.
Finding that the two evaluation modalities frequently diverge in their interpretation of the same SAE feature
Source paper
extracted_fromScott Sauers · Imago · Janus · Antra Tessera
Neighborhood — ranked by edge-count
Papers (1)
paper
Findings (1)
finding
- Shows low agreement between the two evaluation modalities
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Correlation between self-evaluation and textual evaluation of SAE feature emotionality: rho=+0.051 (n.s.)finding0.838Shows that the two evaluation methods for emotionality are largely uncorrelated, indicating they capture different signals
- Interprets the near-zero correlation between the two evaluation methods as evidence they capture distinct signals
- Surprising finding that the two evaluation methods diverge in their relationship with persistence
- Claim supporting the validity of the probe construction method via cross-validation with self-report
- Explains why variance correction is needed to see the self-evaluation–persistence relationship
- Textual evaluation emotionality weakly negatively correlates with SAE feature persistencefinding0.803Contrasts with positive correlation from agentic self-evaluation, suggesting text and self-evaluation capture different aspects
- Open methodological question about the source of the agentic self-evaluation advantage
- Claims that agentic self-evaluation provides independent convergent evidence for emotion-persistence link
Restated by (1)
cosine ≥ 0.90Other entities that say roughly the same thing. May be merge candidates or independent restatements across papers.