Causal Intervention on Representations

The use of interventions (rather than correlations) to establish a causal link between representation geometry and behavioral geometry.

Neighborhood — ranked by edge-count

paper

hypothesis

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

How do interventions on representations causally steer behavior?question0.865
Core question motivating the shift from linear to geometry-aware steering; answered via manifold alignment analysis.
Causal Intervention via Activation Shiftingmethod0.833
Method of shifting hidden state activations along probe directions to cause the model to treat false statements as true and vice versa; evaluated on OOD inputs
Causal Intervention via Activation Shiftmethod0.825
Intervening in model forward pass by adding/subtracting probe direction to group (b) hidden states to flip truth judgments
causal shaping of behavior by representation geometryconcept0.816
Central question: does geometry in activation space causally determine behavior?
Intervention Propagationconcept0.803
Property that additive modifications to activations affect all downstream computations, enabling tractable behavioral control
Causal Mediationconcept0.791
Whether an internal direction causally controls a target behavior, verified by intervention success
Divergent representations are a common, if not likely, outcome of causal interventions across a wide range of methodsclaim0.790
Core empirical claim of the paper supported by both theoretical proof and empirical demonstration
Causal Influence Diagramsframework0.779
Framework informing path-specific objectives by identifying causal chains leading to risky behaviors