finding
active
finding:agentic-self-evaluation-emotionality-correlates-with-sae-feature-persistence-rho-0-124-p-0-0001Agentic self-evaluation emotionality correlates with SAE feature persistence: rho=+0.124, p=0.0001
Shows that features Kimi rates as more emotional via self-steering are more persistent, independent of probe construction
Source paper
extracted_fromScott Sauers · Imago · Janus · Antra Tessera
Neighborhood — ranked by edge-count
Claims (1)
claim
- Claims that agentic self-evaluation provides independent convergent evidence for emotion-persistence link
Methods (1)
method
- Kimi K2.5 uses a tool to steer SAE features on itself in real-time and rates the emotional effect on its own internal state 0-100
Findings (1)
finding
- Shows that model self-report of emotion predicts long-range feature persistence
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Interprets the near-zero correlation between the two evaluation methods as evidence they capture distinct signals
- Correlation between self-evaluation and textual evaluation of SAE feature emotionality: rho=+0.051 (n.s.)finding0.875Shows that the two evaluation methods for emotionality are largely uncorrelated, indicating they capture different signals
- Explains why variance correction is needed to see the self-evaluation–persistence relationship
- Shows self-evaluated emotionality is negatively confounded by variance, requiring variance control to reveal the true signal
- Surprising finding that the two evaluation methods diverge in their relationship with persistence
- Forward-looking claim about the broader utility of the self-steering evaluation method
- Text-based and self-steered emotionality ratings for SAE features are correlated at only ρ = +0.051 (n.s.).finding0.846Shows low agreement between the two evaluation modalities
- Textual evaluation emotionality weakly negatively correlates with SAE feature persistencefinding0.832Contrasts with positive correlation from agentic self-evaluation, suggesting text and self-evaluation capture different aspects
Restated by (1)
cosine ≥ 0.90Other entities that say roughly the same thing. May be merge candidates or independent restatements across papers.