concept
active
concept:causal-intervention-on-representationsCausal Intervention on Representations
The use of interventions (rather than correlations) to establish a causal link between representation geometry and behavioral geometry.
Neighborhood — ranked by edge-count
Papers (1)
paper
Hypotheses (1)
hypothesis
- The causal hypothesis motivating the use of causality (intervention) as the lens connecting representation and behavior geometry.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Core question motivating the shift from linear to geometry-aware steering; answered via manifold alignment analysis.
- Method of shifting hidden state activations along probe directions to cause the model to treat false statements as true and vice versa; evaluated on OOD inputs
- Intervening in model forward pass by adding/subtracting probe direction to group (b) hidden states to flip truth judgments
- Central question: does geometry in activation space causally determine behavior?
- Property that additive modifications to activations affect all downstream computations, enabling tractable behavioral control
- Whether an internal direction causally controls a target behavior, verified by intervention success
- Core empirical claim of the paper supported by both theoretical proof and empirical demonstration
- Framework informing path-specific objectives by identifying causal chains leading to risky behaviors