Imago

Participant in Anima Labs conversation discussing autoregressive recurrence.

Authored

Introduces

Studies

Affiliations

Cited by

Authored papers (1)

Persistence and Introspection of Emotion Features
Emotion features in large language models are bursty but not strictly locally scoped: they exhibit long-tail persistence extending well beyond 100 tokens, and this persistence is specifically tied to emotional content rather than being an artifact of activation variance or autoregressive dynamics. Across 240 multi-turn conversations per model, 171 emotion probes yield token-0-to-token-100 correlations of 0.214 in Cogito v2.1 and 0.367 in Kimi K2.5, compared to only 0.099 and 0.117 for random unit vectors in the same 7168-dimensional layer-40 activation space. After variance-matching each emotion probe against 20 randomly drawn vectors from the top-k eigenspace of the layer-40 covariance matrix, residual autocorrelation averages +0.077 in Cogito (p = 1.5e-27, 157/171 probes positive) and +0.170 in Kimi (p = 6.7e-30, 167/171 positive). The paper introduces agentic self-evaluation — a method in which Kimi K2.5 uses a real-time steering tool on its own SAE features and rates the emotional valence of what it experiences — and finds that self-reported emotionality of SAE features correlates with persistence above variance-matched controls (ρ = +0.124, p = 0.0001), replicating the probe-based result without sharing its potential confounds. SAE features whose direction overlaps more with the 171-dimensional emotion subspace are also more persistent (Spearman +0.413, p = 4.4e-196 in Cogito). The paper argues this implies that LLMs maintain something analogous to lingering affective states — not merely local semantic activation — and that agentic self-steering may offer a scalable route to interpreting internal representations beyond what passive probing methods can detect.

More papers — OpenAlex / S2

Affiliations (1)

Anima Labs(institute)

Co-authors (4)

Antra Tessera2 shared
janus2 shared
cube_flipper1 shared
Scott Sauers1 shared

Other inbound relations (1)

mentionsJanus Information Flow Transformers 2025(paper)

Recent mentions (2)

papers-typed
anima-labs-phenomenology-pt1.md
papers-typed
janus-information-flow-transformers-2025.md