Contrastive concept vector extraction

Method for obtaining concept vectors by subtracting activations from two contrasting prompts.

Neighborhood — ranked by edge-count

method

Activation Steering
implements
Causal intervention technique: edit NLA explanation, reconstruct via AR, use difference as steering vector to manipulate model behavior.

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Single-prompt concept vector extractionmethod0.829
Method using activations from the prompt 'Tell me about {word}' minus mean over other random words to obtain concept vectors.
concept vectorconcept0.819
Computed directional vector in activation space representing a specific concept, used for injection experiments
concept vector computationmethod0.811
Procedure extracting concept vectors as difference of mean activations between concept-exemplifying and baseline/negative sentences
Contrastive analysismethod0.778
Method comparing brain activity in conscious vs. unconscious conditions.
Contrastive Steering Vector Constructionmethod0.778
Method for computing steering vectors as mean activation differences between reflection levels at a given layer.
Role Vector Extractionmethod0.750
Pipeline for extracting mean post-MLP residual stream activations from model responses under persona-specific system prompts to produce role vectors
Contrastive Feature Retrieval Pipelinemethod0.748
A pipeline employing controlled semantic oppositions to distill monosemantic functional features from sparse activation spaces.
concept representationconcept0.742
How a neural network encodes a semantic concept internally, argued to be better captured by manifolds than by atomic features.