artifact
active
artifact:pyvene-open-source-python-library

pyvene open-source Python library

The main artifact introduced in the paper: an open-source PyPI library for customizable interventions on PyTorch models

Neighborhood — ranked by edge-count

Methods (7)

method
  • The core method introduced in this paper: finds alignments between high-level causal variables and distributed neural representations via gradient descent.
  • Fundamental operation for causal abstraction analysis; forces neurons to take values from source inputs to create counterfactuals.
  • Standard method in mechanistic interpretability that intervenes on activations; VPD flips this paradigm by patching parameters.
  • Boundless DAS
    implements
    A variant of DAS implemented in pyvene via BoundlessRotatedSpaceIntervention, introduced by Wu et al. 2023
  • Training technique that induces specific causal structures in neural networks by co-training with interchange interventions
  • Method for fitting a linear classifier on collected activations to predict task-relevant features
  • Zero Ablation
    implements
    Intervention type that sets activations to zero, used for interpretability analysis

Concepts (2)

concept
  • A framework the paper uses alongside feature geometry to deepen mechanistic understanding of LMs
  • The fundamental operation of making in-place changes to model activations, placing the model in a counterfactual state

Claims (1)

claim