finding

active

finding:pyvene-reproduces-meng-et-al-2022-figure-1-factual-association-localization-in-gpt2-xl-in-about-20-lines-of-code

pyvene reproduces Meng et al. 2022 Figure 1 (factual association localization in GPT2-XL) in about 20 lines of code

Case Study I demonstrating pyvene can replicate a major interpretability result compactly

Source paper

extracted_from

pyvene: A Library for Understanding and Improving PyTorch Models via Interventions

(2024) · Zhengxuan Wu · Atticus Geiger · Aryaman Arora · Jing Huang +4

Neighborhood — ranked by edge-count

Papers (1)

paper

pyvene: A Library for Understanding and Improving PyTorch Models via Interventions
introduces

Claims (1)

claim

pyvene provides a unified and extensible framework for performing interventions on neural models and sharing the intervened upon models with others
supports
Core design claim of the pyvene paper summarizing its contribution over existing tools

Questions (1)

question

Where and how is information stored in model-internal representations?
answered_by
Core question motivating interchange intervention and interpretability research supported by pyvene

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

We reproduce the results in Meng et al. (2022)'s Figure 1 of locating early sites and late sites of factual associations in GPT2-XL in about 20 lines of pyvene code.quote0.924
Load-bearing demonstration of pyvene's conciseness for complex replication tasks
Locating and Editing Factual Associations in GPT (Meng et al., 2022)concept0.817
Cited as causal intervention methodology precedent
GPT-2 implements at least one induction head using pointer arithmetic on positional embeddings rather than K-compositionhypothesis0.743
Observation of an alternative induction head implementation algorithm in larger models with positional embeddings in the residual stream
Interpretability in the Wild: A Circuit for Indirect Object Identification in GPT-2 Small (Wang et al., 2023)concept0.737
Cited as causal intervention methodology precedent for this paper's ablation approach
Sentence localization accuracy reaches 88% at layer 2, α=5 vs. 10% chance in 10-way classificationfinding0.728
Highest localization accuracy achieved, showing strong partial introspection for early-layer injections
Software implementations for all of the models/behaviours presented are common for n = 2, and can be made very efficient for α_i that map many objects onto a much smaller set of object families.claim0.726
Claim about current practical feasibility and efficiency of 2-way associative implementations.
Under ask-correct, probes trained on arithmetic tasks A1-A3 generalize almost perfectly to factual tasks F0-F2 (AUROC ~1.0), whereas under no-prompt this generalization is largely absent.finding0.724
Key improvement in cross-task generalization enabled by explicit instruction framing.
DAS consistently finds the most causally-efficacious features across all pythia model sizes in CausalGymfinding0.722
Main benchmark result showing DAS superiority over probing, diff-in-means, PCA, k-means, LDA, and random