concept
active
concept:activation-velocityActivation velocity
Cumulative drift measure in internal representations across turns introduced by Das & Fioretto 2026
Neighborhood — ranked by edge-count
Papers (1)
paper
Thinkers (1)
thinker
- Saswat DasintroducesIntroduced activation velocity measure for cumulative internal drift across conversation turns
Concepts (1)
concept
- Persona driftextendsBehavioural drift in multi-turn LLM interaction; documented in prior work for persona, identity, and instruction-following
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Internal representations of the model on which probes operate; the method uses activations to rank datapoints.
- Key capability: covariance pooling compresses gigabytes of activations into compact stable embeddings without large labeled datasets.
- Causal intervention technique: edit NLA explanation, reconstruct via AR, use difference as steering vector to manipulate model behavior.
- Representation space on which linear probes operate to attribute harmful behaviors to training data.
- Technique of reading out model beliefs from internal activations before the final answer token is generated
- Pearson correlation of feature activations across 40M tokens used to measure feature similarity and universality across models
- Model-independent feature comparison based on correlating activation vectors across a fixed diverse dataset
- Intervention method that adds a learned direction vector to residual stream activations to steer model behavior