claim

active

claim:the-relationship-between-persistence-and-self-evaluated-emotionality-serves-as-a-replication-of-probe-based-findings-without-shared-confounds-from-probe-construction

The relationship between persistence and self-evaluated emotionality serves as a replication of probe-based findings without shared confounds from probe construction

Claims that agentic self-evaluation provides independent convergent evidence for emotion-persistence link

Source paper

extracted_from

Persistence and Introspection of Emotion Features

Scott Sauers · Imago · Janus · Antra Tessera

Neighborhood — ranked by edge-count

Findings (1)

finding

Agentic self-evaluation emotionality correlates with SAE feature persistence: rho=+0.124, p=0.0001
supports
Shows that features Kimi rates as more emotional via self-steering are more persistent, independent of probe construction

Claims (1)

claim

Emotion features in LLMs are genuinely more persistent than variance-matched random features, indicating stateful emotional encoding beyond autoregressive dynamics
supports
Central interpretive claim of the paper supported by multiple convergent analyses

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Is the stronger persistence-emotionality relationship from self-evaluation due to introspection quality, or merely due to ability to test additional steering strengths including negative?question0.855
Open methodological question about the source of the agentic self-evaluation advantage
Emotion probes are more persistent than variance-matched random probes, indicating emotion-specific persistence beyond autoregressive dynamics.claim0.839
Core empirical claim distinguishing emotion persistence from generic high-variance probe persistence
The correlation between emotion subspace fraction and self-evaluated emotionality validates that emotion probe concepts somewhat overlap with the model's self-reported internal emotions.claim0.820
Claim supporting the validity of the probe construction method via cross-validation with self-report
Self-evaluated emotionality and textual evaluation of SAE features predict persistence in opposite directions.claim0.818
Surprising finding that the two evaluation methods diverge in their relationship with persistence
Persistence is not an artifact of probe construction because lower (more central) emotion PCs are more persistent than noisier high-rank PCsclaim0.816
Rules out measurement artifact explanation for the persistence finding
When probe and self-report agree and move together causally, confidence in both increases as evidence they track the same underlying stateclaim0.815
Convergent validity logic applied to LLM interpretability; probes validate self-reports and vice versa
Whether observed persistence reflects a genuine lingering emotion-like state or merely persistent conversational context that produced the emotion-relevant activationquestion0.811
Core unresolved confound the paper acknowledges but cannot rule out
SAE-based persistence replication of probe-based findings (no shared probe confounds)claim0.807
The SAE self-evaluation persistence finding serves as a replication of probe-based results that shares no potential probe construction confounds