Neural Network Intervention

The fundamental operation of making in-place changes to model activations, placing the model in a counterfactual state

Neighborhood — ranked by edge-count

concept

Counterfactual State
associated_with
The state a neural network is placed in when its activations are modified via intervention
Serial Intervention
extends
Intervention mode where interventions are applied sequentially, each building on the previous one
Intervenable Configuration
implements
Dict-based configuration format in pyvene that outlines which model components will be intervened upon
Parallel Intervention
extends
Intervention mode where multiple interventions are applied simultaneously to the same base computation graph
Subspace Intervention
extends
Intervention targeting specific dimensional subsets of activation vectors rather than full representations

artifact

pyvene open-source Python library
about
The main artifact introduced in the paper: an open-source PyPI library for customizable interventions on PyTorch models

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Neural-network agentsconcept0.805
The artificial agents trained with RL in this study, whose latent dynamics are analyzed for causal emergence.
Neural Networksconcept0.802
neural cognitionconcept0.768
Cognition in nervous systems, used as a modelling target
Intervention Propagationconcept0.766
Property that additive modifications to activations affect all downstream computations, enabling tractable behavioral control
Causal Intervention on Representationsconcept0.766
The use of interventions (rather than correlations) to establish a causal link between representation geometry and behavioral geometry.
linear interventionconcept0.763
Manipulation of activations along a straight line; shown to fail when it crosses voids, in contrast to manifold-following interventions.
Interchange Interventionmethod0.756
Fundamental operation for causal abstraction analysis; forces neurons to take values from source inputs to create counterfactuals.
Feature (neural network)concept0.751
Scalar function of the input corresponding to a direction in the vector space of neuron activations; claimed to be the fundamental unit of neural networks