method
active
method:causal-contrast-z-scoreCausal Contrast Z-Score
Per-(emotion, token) z-score computed as injected emotion activation minus mean of 170 other probes, contrasted against no-steering baseline
Neighborhood — ranked by edge-count
Methods (2)
method
- Applies a 5-token steering pulse to each emotion probe and measures persistence of causal effect via contrast z-score over 200 subsequent tokens
- Multiple testing correction applied to significance tests of emotion persistence and self-evaluation word associations
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The property that living structures contain intense contrast—far more than one imagines helpful; true opposites which annihilate each other when superimposed, creating differentiation that gives birth to something; contrast unifies rather than separates when used correctly
- Calibration protocol: whiten embeddings on dev pool, z-score ρd and dr per layer.
- Pairs of statements with opposite truth values used as input to CCS; e.g., cities and neg_cities paired statements
- Statistical regularities in sensorium learned by perceptual and value mechanisms.
- Probe method combining causal interventions and structural analysis, supported by pyvene's activation collection
- Standardization of ρd, dr, and log k on dev set for computing S.
- Adaptation of Hewitt and Liang control tasks to CausalGym: next-token labels replaced with arbitrary tokens to measure method expressivity
- Pearson correlation of feature activations across 40M tokens used to measure feature similarity and universality across models