Contrastive Stimulus Design

LAT methodology step constructing paired prompts that elicit divergent behaviors to extract steering vectors

Neighborhood — ranked by edge-count

method

Linear Artificial Tomography (LAT)
uses
Method for extracting deception steering vectors via PCA on contrastive activation differences; achieves 89% detection accuracy

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Contrastive Pairsconcept0.783
Pairs of prompts at different reflection levels used to compute steering vectors.
Contrastconcept0.782
The property that living structures contain intense contrast—far more than one imagines helpful; true opposites which annihilate each other when superimposed, creating differentiation that gives birth to something; contrast unifies rather than separates when used correctly
Contrastive learningframework0.768
Supervised learning framework where system learns by observing contrast between current response and nudged improved response; requires weak additional forces from supervisor
Contrastive analysismethod0.756
Method comparing brain activity in conscious vs. unconscious conditions.
Contrast Pairsconcept0.735
Pairs of statements with opposite truth values used as input to CCS; e.g., cities and neg_cities paired statements
Contrastive Activation Steeringmethod0.732
Core technique: takes mean difference of model activations on contrastive prompts and adds the resulting vector to the residual stream at inference time.
Contrastive mean-difference probemethod0.731
Probe construction method: concept vector at each layer is L2-normalized difference between mean positive and mean negative representations from contrastive system prompts
Contrast-Consistent Searchmethod0.730
Unsupervised probing method from Burns et al. 2023 that identifies directions along which contrast pair representations are far apart