finding
active
finding:sae-feature-11100-associated-with-panic-93rd-percentile-emotion-subspace-fractionSAE Feature #11100 associated with panic, 93rd percentile emotion subspace fraction
Shows high emotion subspace overlap for a specific negative emotion feature
Source paper
extracted_fromScott Sauers · Imago · Janus · Antra Tessera
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- SAE feature #11100 (93rd percentile subspace fraction) induces reports of panic and urgency in Kimi K2.5.finding0.877Qualitative illustration of a high-emotion-subspace-alignment SAE feature
- Shows that highest emotion-subspace-overlap features induce distinctive thematic outputs
- SAE Feature #43713 associated with agentic defiance and rage, 99th percentile emotion subspace fractionfinding0.854High subspace fraction feature associated with defiant, uncontrollable agentic behavior in self-steering
- Highest-rated emotional SAE feature; self-report describes overwhelming positive emotional valence
- SAE Feature #92372 fires 666,235 times in corpus, associated with urgency vs. receptive calm dimensionfinding0.801Example of a highly active SAE feature modulating urgency versus acceptance as an emotional dimension
- Fraction of an SAE feature's length lying inside the 171-dimensional subspace spanned by emotion probes, computed via SVD orthogonalization
- Highest emotion-subspace-overlap feature; induces genre-specific behavioral change rather than explicit emotional report
- Text-based and self-steered emotionality ratings for SAE features are correlated at only ρ = +0.051 (n.s.).finding0.793Shows low agreement between the two evaluation modalities