finding

active

finding:cogito-emotion-probe-residual-autocorrelation-0-077-above-variance-matched-controls-p-1-5e-27-157-171-probes-positive

Cogito emotion probe residual autocorrelation +0.077 above variance-matched controls (p=1.5e-27, 157/171 probes positive)

Demonstrates that Cogito emotion probes are persistently active beyond what is explained by their variance alone

Source paper

extracted_from

Persistence and Introspection of Emotion Features

Scott Sauers · Imago · Janus · Antra Tessera

Neighborhood — ranked by edge-count

Claims (1)

claim

Emotion features in LLMs are genuinely more persistent than variance-matched random features, indicating stateful emotional encoding beyond autoregressive dynamics
supports
Central interpretive claim of the paper supported by multiple convergent analyses

Hypotheses (1)

hypothesis

We hypothesize that emotion states are more persistent because they correspond to genuinely stateful internal representations, not merely local surface content
associated_with
Proposed explanation for why emotion probes are more persistent than variance-matched random probes

Methods (1)

method

Variance-Matched Random Probe Comparison
introduces
Controls for variance by sampling random directions from top-k PC spaces matching each emotion probe's explained variance, and subtracting median persistence of 20 matched directions

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Emotion probe persistence correlation of 0.214 in Cogito v2.1 vs 0.099 for random vectorsfinding0.884
Quantifies emotion feature persistence above random baseline in Cogito across 240 multi-turn conversations
In Cogito v2.1, average residual persistence above variance-matched probes is +0.077 (p = 1.5e-27, 157 of 171 probes positive).finding0.859
Demonstrates emotion-specific persistence beyond variance effects in Cogito
Emotion probe persistence (token-0 to token-100 correlation) in Cogito v2.1 is 0.214, compared to 0.099 for random unit vectors in 7168D space.finding0.859
Quantitative measure of emotion feature persistence vs random baseline in Cogito
SAE emotion subspace overlap correlates with variance-residualized persistence in Cogito: Spearman +0.413, p = 4.4e-196.finding0.825
Strong positive relationship between emotion alignment and SAE feature persistence in Cogito
Lower (more central) PCs of emotion feature activations are more persistent than higher-rank (noisier) PCs in both Kimi and Cogito, above variance-matched baselines.finding0.792
Supports that persistence is genuinely tied to emotion structure rather than measurement artifact
Emotion probes are more persistent than variance-matched random probes, indicating emotion-specific persistence beyond autoregressive dynamics.claim0.791
Core empirical claim distinguishing emotion persistence from generic high-variance probe persistence
SAE feature emotion subspace overlap correlates with persistence in Cogito: Spearman +0.413, p=4.4e-196finding0.789
Demonstrates that SAE features more aligned with the emotion subspace are more persistent in Cogito after variance control
17 of 83 tested emotions show significant association between self-eval transcript word mention and cosine similarity to emotion probefinding0.784
Validates that agentic self-evaluation captures genuine emotional content of probes