method
active
method:pca-of-emotion-feature-activationsPCA of Emotion Feature Activations
PCA on 171 emotion probe activations across all tokens to produce ordered linear combinations and test if lower PCs are more persistent
Neighborhood — ranked by edge-count
Findings (1)
finding
- Rules out that persistence is an artifact of probe construction, since noise dimensions are not similarly persistent
Claims (1)
claim
- Rules out measurement artifact explanation for the persistence finding
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Statistical method used to analyze neural activity data.
- Used to visually inspect separation of truth-related directions in model activation space across layers
- Mechanism by which activation of an emotion feature sometimes leads to later suppression of that same featurequestion0.770Identified research gap: the paper observes anti-persistence but has no explanation for it
- Open mechanistic question arising from the causal steering experiment
- Used to visualize LLM true/false representations, revealing clear linear structure separating true from false statements
- The phenomenon where activating an emotion feature leads to subsequent below-baseline activation of that feature
- The phenomenon that emotion feature activations remain elevated above baseline beyond local token bursts, measurable as long-range correlation
- Analysis showing that lower-rank (more central) PCs of emotion feature activations are more persistent than higher-rank (noisier) PCs