thinker:zheng-wangZheng Wang
Authored papers (1)
pyvene is an open-source Python library that unifies intervention-based research on PyTorch neural models by treating the intervention itself—rather than model surgery code—as the primitive abstraction, expressed in a serializable dict-based configuration that can be shared via HuggingFace. Prior libraries (BauKit, TransformerLens, nnsight, graphpatch, Transformer Debugger) either lack extensibility to recurrent and convolutional architectures or require sophisticated custom code for multi-source, cross-forward-pass interventions; pyvene resolves both limitations with Getter/Setter hooks that track state variables enabling intervention at arbitrary time steps in GRU and other recurrent models. The library ships with trainable intervention types including RotatedSpaceIntervention (Distributed Alignment Search), LowRankRotatedSpaceIntervention, and BoundlessRotatedSpaceIntervention, and reproduces Meng et al.'s factual-association localization result in GPT2-XL in approximately 20 lines of code. A second case study on Pythia-6.9B demonstrates that a 1D DAS intervention finds sparse, causally localized gender representations across layers, whereas a linear probe achieves near-100% classification accuracy almost everywhere—implying that high probe accuracy is insufficient evidence of causal relevance, and that trainable interventions provide a strictly more diagnostic test of whether a representation is mechanistically load-bearing for a behavior.
More papers — OpenAlex / S2
Affiliations (1)
- Stanford University(institute)
Co-authors (7)
- Aryaman Arora9 shared
- Atticus Geiger9 shared
- Christopher D. Manning9 shared
- Christopher Potts9 shared
- Jing Huang9 shared
- Noah D. Goodman9 shared
- Zhengxuan Wu9 shared
Their work is cited by (3)
Recent mentions (1)
- papers-typedwu-2024-pyvene-library.md