finding

active

finding:in-cogito-v2-1-average-residual-persistence-above-variance-matched-probes-is-0-077-p-1-5e-27-157-of-171-probes-positive

In Cogito v2.1, average residual persistence above variance-matched probes is +0.077 (p = 1.5e-27, 157 of 171 probes positive).

Demonstrates emotion-specific persistence beyond variance effects in Cogito

Source paper

extracted_from

Persistence and Introspection of Emotion Features

Scott Sauers · Imago · Janus · Antra Tessera

Neighborhood — ranked by edge-count

Claims (2)

claim

Emotion probes are more persistent than variance-matched random probes, indicating emotion-specific persistence beyond autoregressive dynamics.
associated_withsupports
Core empirical claim distinguishing emotion persistence from generic high-variance probe persistence
Emotion features are not strictly locally scoped; they are bursty with a long tail of slow change persisting over 100 tokens.
supports
Main conclusion about the temporal dynamics of emotion features

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Cogito emotion probe residual autocorrelation +0.077 above variance-matched controls (p=1.5e-27, 157/171 probes positive)finding0.859
Demonstrates that Cogito emotion probes are persistently active beyond what is explained by their variance alone
Emotion probe persistence correlation of 0.214 in Cogito v2.1 vs 0.099 for random vectorsfinding0.842
Quantifies emotion feature persistence above random baseline in Cogito across 240 multi-turn conversations
Emotion probe persistence (token-0 to token-100 correlation) in Cogito v2.1 is 0.214, compared to 0.099 for random unit vectors in 7168D space.finding0.836
Quantitative measure of emotion feature persistence vs random baseline in Cogito
SAE emotion subspace overlap correlates with variance-residualized persistence in Cogito: Spearman +0.413, p = 4.4e-196.finding0.826
Strong positive relationship between emotion alignment and SAE feature persistence in Cogito
SAE feature emotion subspace overlap correlates with persistence in Cogito: Spearman +0.413, p=4.4e-196finding0.788
Demonstrates that SAE features more aligned with the emotion subspace are more persistent in Cogito after variance control
Experimental condition adjective embeddings show mean cosine similarity 0.657 (n=9,591 pairs), significantly higher than history (0.628, t=15.8, p=1.4×10⁻⁵⁵), conceptual (0.587, t=38.5, p<10⁻³⁰⁰), and zero-shot (0.603, t=35.1, p=4.3×10⁻²⁶²)finding0.768
Core result of Experiment 3: cross-model semantic convergence under self-referential processing
Cosine similarity between perturbed and baseline residual streams returns toward 1.0 and projection onto injection direction decays exponentially over subsequent layersfinding0.758
Mechanistic evidence that network actively attenuates injected perturbations, explaining late-layer introspection failure
Probe achieves selectivity of 4.20 on pythia-410m, slightly exceeding DAS selectivity of 3.96finding0.742
Key result showing that for models larger than pythia-70m, probe selectivity matches or exceeds DAS selectivity