artifact
active
artifact:pyvene-open-source-python-librarypyvene open-source Python library
The main artifact introduced in the paper: an open-source PyPI library for customizable interventions on PyTorch models
Neighborhood — ranked by edge-count
Papers (1)
paper
Methods (7)
method
- Distributed Alignment SearchimplementsThe core method introduced in this paper: finds alignments between high-level causal variables and distributed neural representations via gradient descent.
- Interchange InterventionimplementsFundamental operation for causal abstraction analysis; forces neurons to take values from source inputs to create counterfactuals.
- Activation patchingimplementsStandard method in mechanistic interpretability that intervenes on activations; VPD flips this paradigm by patching parameters.
- Boundless DASimplementsA variant of DAS implemented in pyvene via BoundlessRotatedSpaceIntervention, introduced by Wu et al. 2023
- Interchange Intervention Training (IIT)implementsTraining technique that induces specific causal structures in neural networks by co-training with interchange interventions
- Linear Probe TrainingimplementsMethod for fitting a linear classifier on collected activations to predict task-relevant features
- Zero AblationimplementsIntervention type that sets activations to zero, used for interpretability analysis
Concepts (2)
concept
- Causal abstractionimplementsA framework the paper uses alongside feature geometry to deepen mechanistic understanding of LMs
- The fundamental operation of making in-place changes to model activations, placing the model in a counterfactual state
Claims (1)
claim
- Technical claim justifying pyvene's state-variable hook tracking for recurrent model support