concept
active
concept:emotion-concepts-and-their-function-in-a-large-language-model

Emotion Concepts and their Function in a Large Language Model

The prior Anthropic paper whose findings about emotion features in Claude this paper builds upon and extends

Neighborhood — ranked by edge-count

Methods (1)

method
  • Method for building 171 emotion probes by generating stories, embedding them, regressing out Gemini embeddings, and averaging residual activations per emotion

Concepts (2)

concept
  • Internal representations encoding emotion concepts in large language models, identified by probing and SAE methods
  • Emotion-encoding directions in LLM activation space that can be amplified or suppressed via activation steering to causally drive model behavior

Hypotheses (1)

hypothesis

Institutes (1)

institute
  • Anthropic
    authoredmentions
    Lab behind Claude models and Constitutional AI training approach; represents highest baseline scores and lowest prompt lift.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.