concept
active
concept:contrastive-system-prompt-completions

Contrastive system prompt completions

Training method for probes: generate completions under opposing system prompts to induce positive and negative poles of a concept

Neighborhood — ranked by edge-count

Methods (1)

method
  • Probe construction method: concept vector at each layer is L2-normalized difference between mean positive and mean negative representations from contrastive system prompts

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Contrastive learningframework0.727
    Supervised learning framework where system learns by observing contrast between current response and nudged improved response; requires weak additional forces from supervisor
  • A sense of being complete and comfortable, as in the friendly house edge, that enhances life.
  • LAT methodology step constructing paired prompts that elicit divergent behaviors to extract steering vectors
  • Contrastconcept0.706
    The property that living structures contain intense contrast—far more than one imagines helpful; true opposites which annihilate each other when superimposed, creating differentiation that gives birth to something; contrast unifies rather than separates when used correctly
  • Unsupervised probing method from Burns et al. 2023 that identifies directions along which contrast pair representations are far apart
  • Unsupervised probe by Burns et al. to predict latent truth representations; cited as related but limited in generalization
  • Optimization result for steering vector construction.
  • Method for obtaining concept vectors by subtracting activations from two contrasting prompts.