concept
active
concept:intervention-propagationIntervention Propagation
Property that additive modifications to activations affect all downstream computations, enabling tractable behavioral control
Neighborhood — ranked by edge-count
Concepts (1)
concept
- Residual Streamassociated_withProposed pathway flowing through layers at each position; calculates K/V values that feed horizontal information flow.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The use of interventions (rather than correlations) to establish a causal link between representation geometry and behavioral geometry.
- Inference mechanism underlying active inference; updates posterior beliefs via gradient descent on free energy.
- pyvene's approach of storing interventions as shareable serialized objects rather than runtime code
- Intervention targeting specific dimensional subsets of activation vectors rather than full representations
- Fundamental operation for causal abstraction analysis; forces neurons to take values from source inputs to create counterfactuals.
- Intervention mode where interventions are applied sequentially, each building on the previous one
- Method of shifting hidden state activations along probe directions to cause the model to treat false statements as true and vice versa; evaluated on OOD inputs
- Standard learning algorithm for deep neural networks that propagates error signals to adjust weights; lacks convergence guarantee for non-linearly separable functions