concept
active
concept:neural-network-interventionNeural Network Intervention
The fundamental operation of making in-place changes to model activations, placing the model in a counterfactual state
Neighborhood — ranked by edge-count
Concepts (5)
concept
- Counterfactual Stateassociated_withThe state a neural network is placed in when its activations are modified via intervention
- Serial InterventionextendsIntervention mode where interventions are applied sequentially, each building on the previous one
- Intervenable ConfigurationimplementsDict-based configuration format in pyvene that outlines which model components will be intervened upon
- Parallel InterventionextendsIntervention mode where multiple interventions are applied simultaneously to the same base computation graph
- Subspace InterventionextendsIntervention targeting specific dimensional subsets of activation vectors rather than full representations
Artifacts (1)
artifact
- The main artifact introduced in the paper: an open-source PyPI library for customizable interventions on PyTorch models
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The artificial agents trained with RL in this study, whose latent dynamics are analyzed for causal emergence.
- Cognition in nervous systems, used as a modelling target
- Property that additive modifications to activations affect all downstream computations, enabling tractable behavioral control
- The use of interventions (rather than correlations) to establish a causal link between representation geometry and behavioral geometry.
- Manipulation of activations along a straight line; shown to fail when it crosses voids, in contrast to manifold-following interventions.
- Fundamental operation for causal abstraction analysis; forces neurons to take values from source inputs to create counterfactuals.
- Scalar function of the input corresponding to a direction in the vector space of neuron activations; claimed to be the fundamental unit of neural networks