finding
active
finding:npi-mechanism-in-pythia-1b-moves-negation-feature-through-complementiser-that-auxiliary-verb-and-main-verb-across-layers-before-predicting-npi-anyNPI mechanism in pythia-1b moves negation feature through complementiser 'that', auxiliary verb, and main verb across layers before predicting NPI 'any'
Mechanistic finding from CausalGym case study showing multi-step information movement in NPI mechanism
Source paper
extracted_from(2024) · Aryaman Arora · Dan Jurafsky · Christopher Potts
Neighborhood — ranked by edge-count
Claims (1)
claim
- Mechanistic interpretation of training dynamics in case studies
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- NPI licensing mechanism in pythia-1b emerges in discrete stages (steps 1000, 2000, 3000) not graduallyfinding0.828Training dynamics finding showing abrupt rather than gradual emergence of NPI mechanism
- Training dynamics finding showing filler-gap takes longer to learn than NPI licensing
- Mechanistic finding from CausalGym case study showing complex multi-step movement for filler-gap
- Robustness check across seeds showing occasional failures of alignment map training
- Main mechanistic finding from case studies; evidence from training checkpoint analysis of pythia-1b
- Key limitation acknowledged by authors.
- DAS consistently finds the most causally-efficacious features across all pythia model sizes in CausalGymfinding0.748Main benchmark result showing DAS superiority over probing, diff-in-means, PCA, k-means, LDA, and random
- Open question about inter-agent communication beyond model-space assumption