method
active
method:causal-contrast-z-score

Causal Contrast Z-Score

Per-(emotion, token) z-score computed as injected emotion activation minus mean of 170 other probes, contrasted against no-steering baseline

Neighborhood — ranked by edge-count

Methods (2)

method
  • Applies a 5-token steering pulse to each emotion probe and measures persistence of causal effect via contrast z-score over 200 subsequent tokens
  • Multiple testing correction applied to significance tests of emotion persistence and self-evaluation word associations

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Contrastconcept0.738
    The property that living structures contain intense contrast—far more than one imagines helpful; true opposites which annihilate each other when superimposed, creating differentiation that gives birth to something; contrast unifies rather than separates when used correctly
  • Calibration protocol: whiten embeddings on dev pool, z-score ρd and dr per layer.
  • Contrast Pairsconcept0.719
    Pairs of statements with opposite truth values used as input to CCS; e.g., cities and neg_cities paired statements
  • Statistical regularities in sensorium learned by perceptual and value mechanisms.
  • Probe method combining causal interventions and structural analysis, supported by pyvene's activation collection
  • Standardization of ρd, dr, and log k on dev set for computing S.
  • Adaptation of Hewitt and Liang control tasks to CausalGym: next-token labels replaced with arbitrary tokens to measure method expressivity
  • Pearson correlation of feature activations across 40M tokens used to measure feature similarity and universality across models