Emotive states in LLMs

Directions in activation space associated with contrastive emotive concept pairs studied in this paper as targets for introspection

Neighborhood — ranked by edge-count

Papers (1)

paper

Quantitative Introspection in Language Models: Tracking Emotive States Across Conversation
studies

Methods (1)

method

Linear Probe
studies
Simple linear classifiers trained on model activations used as the probing technique within the introduced method.

Concepts (5)

concept

Emotion geometry in LLM activations
associated_with
Emotion emerges early, peaks in middle layers, sharpens with scale, and persists across tokens in LLM activations per Zhang & Zhong 2025
Focus probe (distracted vs. focused)
implements
One of four emotive concept probes trained; contrastive pair distracted/focused with best layer 10 in LLaMA-3.2-3B
Impulsivity probe (impulsive vs. planning)
implements
One of four emotive concept probes trained; contrastive pair impulsive/planning with best layer 13 in LLaMA-3.2-3B
Interest probe (bored vs. interested)
implements
One of four emotive concept probes trained; contrastive pair bored/interested with best layer 14 in LLaMA-3.2-3B
Wellbeing probe (sad vs. happy)
implements
One of four emotive concept probes trained; contrastive pair sad/happy with best layer 16 in LLaMA-3.2-3B

Findings (1)

finding

Interest probe score drifts positively across turns: LMM slope=0.005, p=4.12×10⁻¹⁴ in LLaMA-3.2-3B
supports
Demonstrates genuine internal-state dynamics in LLMs during multi-turn conversation

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Hallucination in LLMsconcept0.812
Problem cited as a shortcoming of current LLMs; PRH predicts hallucinations should decrease with scale
Emotion Features in LLMsconcept0.792
Internal representations encoding emotion concepts in large language models, identified by probing and SAE methods
Can instruction-tuned LLMs perform quantitative introspection of emotive states in conversation?question0.792
Central research question motivating the entire paper
Are LLM emotion states encoded only selectively in token positions where they are operative, or in a more complex non-linear manner?question0.782
Question raised by Anthropic and partially addressed by this paper's persistence evidence
The coupling between LLM self-report and internal emotive state is causal, not merely correlationalclaim0.780
Supported by same-concept steering experiments showing monotonic shifts in self-report with activation steering
We hypothesize that persistently active emotional state representations exist in LLMs but may be missed by standard probing methods.hypothesis0.776
Open hypothesis from the Anthropic paper that motivates this work
Emergent Abilities of LLMsconcept0.758
Prior work documenting abrupt capability changes under scale; UCCT provides a measurable predictor for when they occur
Emotion features in LLMs are genuinely more persistent than variance-matched random features, indicating stateful emotional encoding beyond autoregressive dynamicsclaim0.749
Central interpretive claim of the paper supported by multiple convergent analyses