Contrastive system prompt completions

Training method for probes: generate completions under opposing system prompts to induce positive and negative poles of a concept

Neighborhood — ranked by edge-count

method

Contrastive mean-difference probe
implements
Probe construction method: concept vector at each layer is L2-normalized difference between mean positive and mean negative representations from contrastive system prompts

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Contrastive learningframework0.727
Supervised learning framework where system learns by observing contrast between current response and nudged improved response; requires weak additional forces from supervisor
comfortable completionconcept0.720
A sense of being complete and comfortable, as in the friendly house edge, that enhances life.
Contrastive Stimulus Designconcept0.713
LAT methodology step constructing paired prompts that elicit divergent behaviors to extract steering vectors
Contrastconcept0.706
The property that living structures contain intense contrast—far more than one imagines helpful; true opposites which annihilate each other when superimposed, creating differentiation that gives birth to something; contrast unifies rather than separates when used correctly
Contrast-Consistent Searchmethod0.706
Unsupervised probing method from Burns et al. 2023 that identifies directions along which contrast pair representations are far apart
Contrast-Consistent Search (CCS)concept0.706
Unsupervised probe by Burns et al. to predict latent truth representations; cited as related but limited in generalization
Four best contrastive prompt pairs outperform full 16-pair average steering vector for type hint suppressionfinding0.705
Optimization result for steering vector construction.
Contrastive concept vector extractionmethod0.701
Method for obtaining concept vectors by subtracting activations from two contrasting prompts.