PCA of Emotion Feature Activations

PCA on 171 emotion probe activations across all tokens to produce ordered linear combinations and test if lower PCs are more persistent

Neighborhood — ranked by edge-count

finding

Lower (more central) emotion PCs are more persistent than higher (noisier) PCs in both Kimi and Cogito
introduces
Rules out that persistence is an artifact of probe construction, since noise dimensions are not similarly persistent

claim

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Principal components analysis (PCA)method0.805
Statistical method used to analyze neural activity data.
PCA Visualizationmethod0.777
Used to visually inspect separation of truth-related directions in model activation space across layers
Mechanism by which activation of an emotion feature sometimes leads to later suppression of that same featurequestion0.770
Identified research gap: the paper observes anti-persistence but has no explanation for it
Why does activation of an emotion feature sometimes lead to its later suppression?question0.764
Open mechanistic question arising from the causal steering experiment
Principal Component Analysis Visualizationmethod0.733
Used to visualize LLM true/false representations, revealing clear linear structure separating true from false statements
Anti-Persistence of Emotion Featuresconcept0.730
The phenomenon where activating an emotion feature leads to subsequent below-baseline activation of that feature
emotion feature persistenceconcept0.727
The phenomenon that emotion feature activations remain elevated above baseline beyond local token bursts, measurable as long-range correlation
PCs of the emotion space and persistenceconcept0.727
Analysis showing that lower-rank (more central) PCs of emotion feature activations are more persistent than higher-rank (noisier) PCs