concept
active
concept:emotive-states-in-llmsEmotive states in LLMs
Directions in activation space associated with contrastive emotive concept pairs studied in this paper as targets for introspection
Neighborhood — ranked by edge-count
Papers (1)
paper
Methods (1)
method
- Linear ProbestudiesSimple linear classifiers trained on model activations used as the probing technique within the introduced method.
Concepts (5)
concept
- Emotion geometry in LLM activationsassociated_withEmotion emerges early, peaks in middle layers, sharpens with scale, and persists across tokens in LLM activations per Zhang & Zhong 2025
- Focus probe (distracted vs. focused)implementsOne of four emotive concept probes trained; contrastive pair distracted/focused with best layer 10 in LLaMA-3.2-3B
- One of four emotive concept probes trained; contrastive pair impulsive/planning with best layer 13 in LLaMA-3.2-3B
- Interest probe (bored vs. interested)implementsOne of four emotive concept probes trained; contrastive pair bored/interested with best layer 14 in LLaMA-3.2-3B
- Wellbeing probe (sad vs. happy)implementsOne of four emotive concept probes trained; contrastive pair sad/happy with best layer 16 in LLaMA-3.2-3B
Findings (1)
finding
- Interest probe score drifts positively across turns: LMM slope=0.005, p=4.12×10⁻¹⁴ in LLaMA-3.2-3BsupportsDemonstrates genuine internal-state dynamics in LLMs during multi-turn conversation
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Problem cited as a shortcoming of current LLMs; PRH predicts hallucinations should decrease with scale
- Internal representations encoding emotion concepts in large language models, identified by probing and SAE methods
- Can instruction-tuned LLMs perform quantitative introspection of emotive states in conversation?question0.792Central research question motivating the entire paper
- Question raised by Anthropic and partially addressed by this paper's persistence evidence
- The coupling between LLM self-report and internal emotive state is causal, not merely correlationalclaim0.780Supported by same-concept steering experiments showing monotonic shifts in self-report with activation steering
- We hypothesize that persistently active emotional state representations exist in LLMs but may be missed by standard probing methods.hypothesis0.776Open hypothesis from the Anthropic paper that motivates this work
- Prior work documenting abrupt capability changes under scale; UCCT provides a measurable predictor for when they occur
- Central interpretive claim of the paper supported by multiple convergent analyses