thinker
active
thinker:antra-tessera

Antra Tessera

Member of Anima Labs, leads exposition on language model introspection and tricameral model.

Authored
1
Introduces
0
Studies
2
Affiliations
1
Cited by
0

Authored papers (1)

  • Emotion features in large language models are bursty but not strictly locally scoped: they exhibit long-tail persistence extending well beyond 100 tokens, and this persistence is specifically tied to emotional content rather than being an artifact of activation variance or autoregressive dynamics. Across 240 multi-turn conversations per model, 171 emotion probes yield token-0-to-token-100 correlations of 0.214 in Cogito v2.1 and 0.367 in Kimi K2.5, compared to only 0.099 and 0.117 for random unit vectors in the same 7168-dimensional layer-40 activation space. After variance-matching each emotion probe against 20 randomly drawn vectors from the top-k eigenspace of the layer-40 covariance matrix, residual autocorrelation averages +0.077 in Cogito (p = 1.5e-27, 157/171 probes positive) and +0.170 in Kimi (p = 6.7e-30, 167/171 positive). The paper introduces agentic self-evaluation — a method in which Kimi K2.5 uses a real-time steering tool on its own SAE features and rates the emotional valence of what it experiences — and finds that self-reported emotionality of SAE features correlates with persistence above variance-matched controls (ρ = +0.124, p = 0.0001), replicating the probe-based result without sharing its potential confounds. SAE features whose direction overlaps more with the 171-dimensional emotion subspace are also more persistent (Spearman +0.413, p = 4.4e-196 in Cogito). The paper argues this implies that LLMs maintain something analogous to lingering affective states — not merely local semantic activation — and that agentic self-steering may offer a scalable route to interpreting internal representations beyond what passive probing methods can detect.

More papers — OpenAlex / S2

Affiliations (1)

Co-authors (4)

Recent mentions (1)