Causal Contrast Z-Score

Per-(emotion, token) z-score computed as injected emotion activation minus mean of 170 other probes, contrasted against no-steering baseline

Neighborhood — ranked by edge-count

method

5-Token Steering Pulse Experiment
uses
Applies a 5-token steering pulse to each emotion probe and measures persistence of causal effect via contrast z-score over 200 subsequent tokens
Benjamini-Hochberg FDR correction
uses
Multiple testing correction applied to significance tests of emotion persistence and self-evaluation word associations

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Contrastconcept0.738
The property that living structures contain intense contrast—far more than one imagines helpful; true opposites which annihilate each other when superimposed, creating differentiation that gives birth to something; contrast unifies rather than separates when used correctly
whitening and z-scoring proceduremethod0.730
Calibration protocol: whiten embeddings on dev pool, z-score ρd and dr per layer.
Contrast Pairsconcept0.719
Pairs of statements with opposite truth values used as input to CCS; e.g., cities and neg_cities paired statements
causal structure of sensory contingenciesconcept0.718
Statistical regularities in sensorium learned by perceptual and value mechanisms.
Causal Structural Probemethod0.715
Probe method combining causal interventions and structural analysis, supported by pyvene's activation collection
whitening and z-scoring protocolmethod0.715
Standardization of ρd, dr, and log k on dev set for computing S.
Control task for causal evaluationmethod0.712
Adaptation of Hewitt and Liang control tasks to CausalGym: next-token labels replaced with arbitrary tokens to measure method expressivity
Activation Correlationmethod0.712
Pearson correlation of feature activations across 40M tokens used to measure feature similarity and universality across models