Emotion Features in LLMs

Internal representations encoding emotion concepts in large language models, identified by probing and SAE methods

Neighborhood — ranked by edge-count

Papers (1)

paper

Persistence and Introspection of Emotion Features
studies

Concepts (2)

concept

Emotion Concepts and their Function in a Large Language Model
introduces
The prior Anthropic paper whose findings about emotion features in Claude this paper builds upon and extends
Emotion Subspace
associated_with
The subspace of activation space spanned by the 171 orthogonalized emotion probe vectors, used to measure SAE feature emotional alignment

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Emotion geometry in LLM activationsconcept0.830
Emotion emerges early, peaks in middle layers, sharpens with scale, and persists across tokens in LLM activations per Zhang & Zhong 2025
Are LLM emotion states encoded only selectively in token positions where they are operative, or in a more complex non-linear manner?question0.802
Question raised by Anthropic and partially addressed by this paper's persistence evidence
Emotion features in LLMs are genuinely more persistent than variance-matched random features, indicating stateful emotional encoding beyond autoregressive dynamicsclaim0.800
Central interpretive claim of the paper supported by multiple convergent analyses
We hypothesize that persistently active emotional state representations exist in LLMs but may be missed by standard probing methods.hypothesis0.793
Open hypothesis from the Anthropic paper that motivates this work
Emotive states in LLMsconcept0.792
Directions in activation space associated with contrastive emotive concept pairs studied in this paper as targets for introspection
Linear Representation of Concepts in LLMsconcept0.783
The finding that interpretable concepts including character traits are encoded as linear directions in transformer residual streams
To what extent is there persistence of emotional state beyond what is expected merely from the autoregressive nature of LLMs?question0.775
The central research question motivating the paper
emotion feature persistenceconcept0.757
The phenomenon that emotion feature activations remain elevated above baseline beyond local token bursts, measurable as long-range correlation