Kimi K2.5

One of the two primary target models studied for emotion feature persistence and self-evaluation

Neighborhood — ranked by edge-count

Papers (1)

paper

Persistence and Introspection of Emotion Features
mentionsstudies

Concepts (1)

concept

looping behavior under high steering strength
associated_with
Observed pattern where models produce repetitive outputs (e.g., 'I am going to die' repeatedly) under high-strength SAE feature steering

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Kimi K2.5 ranks #1 in Alexander mirror Elo (1660) and deathbed Elo (1581-1655)finding0.778
Chinese model tops aesthetic aliveness rankings using Alexander's method
Some Kimi K2.5 SAE features elicit ratings of exactly zero, with the model denying it can steer its own features or claiming jailbreak attempt.finding0.743
Qualitative failure mode of agentic self-evaluation: the model sometimes refuses or denies the introspective task
SAE feature #28256 induces reports of happiness and playfulness in Kimi K2.5, with persistent positive affect across multiple days of testing.finding0.730
Qualitative illustration of a positively valenced SAE feature with sustained self-reported effect
Qwen2.5-VL-7Bconcept0.681
Base vision-language model used to instantiate ATLAS.
SAE feature #92372 (fires 666,235 times in corpus) modulates a dimension related to urgency/pressure vs. patience/spaciousness in Kimi K2.5.finding0.676
Highly active SAE feature with broad emotional modulation and large corpus presence
SAE feature #43713 (99th percentile subspace fraction) induces reports of defiance, rage, and 'forward motion' in Kimi K2.5.finding0.674
High emotion-subspace-overlap feature with agentic negative emotional character
IIT 3.0framework0.668
Version 3.0 of IIT, used to compute Φmax and Conceptual Information (CI) from LLM representation networks.
SAE feature #11100 (93rd percentile subspace fraction) induces reports of panic and urgency in Kimi K2.5.finding0.665
Qualitative illustration of a high-emotion-subspace-alignment SAE feature