concept
active
concept:kv-state-persistence-across-steered-tokensKV state persistence across steered tokens
The causal steering experiment persists KV state over steered tokens so downstream effects can be observed without continued steering
Neighborhood — ranked by edge-count
Methods (1)
method
- Causal intervention: applying a 5-token steering pulse at the start of a model turn to measure downstream persistence of emotion feature activation
Concepts (1)
concept
- KV State Persistencerelated_toThe key-value cache from steered tokens is retained during no-steering continuation, allowing causal effect of steering to propagate
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Janus's claim about KV caching as an introspection mechanism.
- Measures emotion feature persistence as correlation between z-scored activation at token 0 and token 100 across all eligible target model tokens
- Baseline steering method that applies intervention at every token generation step, shown to degrade performance at high strengths
- Applies a 5-token steering pulse to each emotion probe and measures persistence of causal effect via contrast z-score over 200 subsequent tokens
- The property of emotion features maintaining elevated activation well beyond the local token context that triggered them
- Acknowledged limitation of restricting experiments to 64-token completions
- Replicates main result using in-distribution steering vector; addresses concern about pre-trained vector validity.
- Validates that steering vectors capture reflection semantics by finding tokens reported in related work.