Single-prompt concept vector extraction

Method using activations from the prompt 'Tell me about {word}' minus mean over other random words to obtain concept vectors.

Neighborhood — ranked by edge-count

paper

method

Activation Steering
implements
Causal intervention technique: edit NLA explanation, reconstruct via AR, use difference as steering vector to manipulate model behavior.

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Contrastive concept vector extractionmethod0.829
Method for obtaining concept vectors by subtracting activations from two contrasting prompts.
concept vectorconcept0.790
Computed directional vector in activation space representing a specific concept, used for injection experiments
concept vector computationmethod0.771
Procedure extracting concept vectors as difference of mean activations between concept-exemplifying and baseline/negative sentences
Role Vector Extractionmethod0.726
Pipeline for extracting mean post-MLP residual stream activations from model responses under persona-specific system prompts to produce role vectors
concept representationconcept0.703
How a neural network encodes a semantic concept internally, argued to be better captured by manifolds than by atomic features.
Sense Vectorsconcept0.701
Vectors acquired during pretraining in Backpack LMs that have a multiplication effect on model generation
Concept Algebraframework0.700
Probabilistic framework formalizing concept-specific subspaces for targeted steering in generative models.
Distinguishing Injected Concepts from Text Inputsfinding0.698
Models maintain ability to accurately transcribe input text while simultaneously reporting on injected thoughts, all models perform above chance, Opus 4/4.1 best.