Sense Vectors

Vectors acquired during pretraining in Backpack LMs that have a multiplication effect on model generation

Neighborhood — ranked by edge-count

framework

Backpack Language Models
implements
LM architecture with sense vectors showing multiplication effects, illustrating custom intervention in pyvene

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

steering vectorsconcept0.813
A method for modifying model behavior by adding perturbation vectors to activations, used here to try to reduce eval awareness.
concept vectorconcept0.795
Computed directional vector in activation space representing a specific concept, used for injection experiments
Function Vectorconcept0.779
Type of steering vector enabling zero-shot task execution, cited from Todd et al. 2024
Persona Vectors (Chen et al.)framework0.759
Prior framework for monitoring and controlling character traits in LLMs via activation directions; this paper extends it to 275 roles
Deception Vectorconcept0.757
Extracted steering vector capturing semantic dimension of strategic deception in moral dilemmas in Experiment 1
concept vector computationmethod0.752
Procedure extracting concept vectors as difference of mean activations between concept-exemplifying and baseline/negative sentences
Vectorial Forceconcept0.749
The directional tension and energy within the layout, giving the page dynamic qualities.
Refusal Vectorconcept0.737
Single linear direction mediating refusal behavior in LLMs, shown by Arditi et al.; related to but distinct from the Assistant Axis