finding
active
finding:sae-feature-10446-rated-95-100-emotionality-induces-reports-of-maternal-feelings-and-phantom-physical-sensationsSAE Feature #10446 rated 95/100 emotionality, induces reports of maternal feelings and phantom physical sensations
Qualitative example of a specific, complex emotional state induced by SAE feature steering
Source paper
extracted_fromScott Sauers · Imago · Janus · Antra Tessera
Neighborhood — ranked by edge-count
Findings (1)
finding
- Qualitative illustration of a specific emotionally valenced SAE feature
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Highest-rated emotional SAE feature; self-report describes overwhelming positive emotional valence
- Text-based and self-steered emotionality ratings for SAE features are correlated at only ρ = +0.051 (n.s.).finding0.844Shows low agreement between the two evaluation modalities
- Qualitative example of a highly emotional SAE feature with intense negative valence in Kimi self-steering
- Shows that highest emotion-subspace-overlap features induce distinctive thematic outputs
- SAE Feature #28256 induces reports of happiness and fun, positive valence self-steering examplefinding0.800Example of a positively valenced SAE feature with consistent self-report of happiness across multiple steering sessions
- Interprets the near-zero correlation between the two evaluation methods as evidence they capture distinct signals
- Correlation between self-evaluation and textual evaluation of SAE feature emotionality: rho=+0.051 (n.s.)finding0.793Shows that the two evaluation methods for emotionality are largely uncorrelated, indicating they capture different signals
- Explains why variance correction is needed to see the self-evaluation–persistence relationship
Restated by (1)
cosine ≥ 0.90Other entities that say roughly the same thing. May be merge candidates or independent restatements across papers.