Activation velocity

Cumulative drift measure in internal representations across turns introduced by Das & Fioretto 2026

Neighborhood — ranked by edge-count

paper

thinker

Saswat Das
introduces
Introduced activation velocity measure for cumulative internal drift across conversation turns

concept

Persona drift
extends
Behavioural drift in multi-turn LLM interaction; documented in prior work for persona, identity, and instruction-following

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Activationsconcept0.752
Internal representations of the model on which probes operate; the method uses activations to rank datapoints.
Activation Compressionconcept0.747
Key capability: covariance pooling compresses gigabytes of activations into compact stable embeddings without large labeled datasets.
Activation Steeringmethod0.731
Causal intervention technique: edit NLA explanation, reconstruct via AR, use difference as steering vector to manipulate model behavior.
Activation spaceconcept0.721
Representation space on which linear probes operate to attribute harmful behaviors to training data.
Activation Probingconcept0.719
Technique of reading out model beliefs from internal activations before the final answer token is generated
Activation Correlationmethod0.718
Pearson correlation of feature activations across 40M tokens used to measure feature similarity and universality across models
Activation Similarityconcept0.713
Model-independent feature comparison based on correlating activation vectors across a fixed diverse dataset
Activation Additionmethod0.712
Intervention method that adds a learned direction vector to residual stream activations to steer model behavior