finding

active

finding:npi-licensing-mechanism-in-pythia-1b-emerges-in-discrete-stages-steps-1000-2000-3000-not-gradually

NPI licensing mechanism in pythia-1b emerges in discrete stages (steps 1000, 2000, 3000) not gradually

Training dynamics finding showing abrupt rather than gradual emergence of NPI mechanism

Source paper

extracted_from

CausalGym: Benchmarking causal interpretability methods on linguistic tasks

(2024) · Aryaman Arora · Dan Jurafsky · Christopher Potts

Neighborhood — ranked by edge-count

Claims (1)

claim

The mechanisms implementing NPI licensing and filler-gap dependencies are learned in discrete stages, not gradually
supports
Main mechanistic finding from case studies; evidence from training checkpoint analysis of pythia-1b

Findings (1)

finding

Filler-gap dependency mechanism in pythia-1b emerges in two discrete stages (steps 2000 and 10K) not gradually
supports
Training dynamics finding showing filler-gap takes longer to learn than NPI licensing

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

NPI mechanism in pythia-1b moves negation feature through complementiser 'that', auxiliary verb, and main verb across layers before predicting NPI 'any'finding0.828
Mechanistic finding from CausalGym case study showing multi-step information movement in NPI mechanism
Filler-gap mechanism in pythia-1b crosses over several different positions before arriving at output positionfinding0.753
Mechanistic finding from CausalGym case study showing complex multi-step movement for filler-gap
Across 5 Pythia seeds, one seed fails to learn IOI task and another fails alignment despite learning the task; all other seeds achieve perfect alignment with ϕ_nonlinfinding0.745
Robustness check across seeds showing occasional failures of alignment map training
pythia-14m achieves only 0.38 accuracy on npi_ever_subj-relc taskfinding0.736
Baseline accuracy showing small models fail on harder NPI licensing tasks
Smaller fully trained Pythia models (31M, 70M) show slightly reduced alignment accuracy compared to larger models despite non-linear mapsfinding0.729
Attributed to model anisotropy from saturation making hidden states harder to access
For both NPI and filler-gap tasks, the model initially learns to move information directly from alternating token to output; intermediate steps are added later in trainingclaim0.728
Mechanistic interpretation of training dynamics in case studies
DAS consistently finds the most causally-efficacious features across all pythia model sizes in CausalGymfinding0.716
Main benchmark result showing DAS superiority over probing, diff-in-means, PCA, k-means, LDA, and random
8-layer ϕ_nonlin achieves near-perfect IIA on Pythia-410m at all training steps including random initialisation on IOI taskfinding0.713
Training progression result showing non-linear maps are uncorrelated with genuine task learning