pythia-14m achieves only 0.38 accuracy on npi_ever_subj-relc task

Baseline accuracy showing small models fail on harder NPI licensing tasks

Source paper

extracted_from

CausalGym: Benchmarking causal interpretability methods on linguistic tasks

(2024) · Aryaman Arora · Dan Jurafsky · Christopher Potts

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Smaller fully trained Pythia models (31M, 70M) show slightly reduced alignment accuracy compared to larger models despite non-linear mapsfinding0.805
Attributed to model anisotropy from saturation making hidden states harder to access
Probe achieves selectivity of 4.20 on pythia-410m, slightly exceeding DAS selectivity of 3.96finding0.796
Key result showing that for models larger than pythia-70m, probe selectivity matches or exceeds DAS selectivity
Linear probe achieves 100% classification accuracy for almost all components in Pythia-6.9B gender taskfinding0.793
Demonstrates that linear probes can overestimate causal relevance; probes succeed on non-causally-relevant representations
Pythia-6.9B achieves 100% accuracy on gendered pronoun prediction taskfinding0.776
Baseline result confirming the model has fully learned the gender prediction task before probing
8-layer ϕ_nonlin achieves near-perfect IIA on Pythia-410m at all training steps including random initialisation on IOI taskfinding0.757
Training progression result showing non-linear maps are uncorrelated with genuine task learning
Task accuracy on CausalGym increases consistently with model scale from 0.62 (14M) to 0.89 (6.9B)finding0.745
Scaling result showing larger pythia models perform better on CausalGym linguistic tasks
Brute-force search achieves maximum IIA of 0.60 on MoNLI tasksfinding0.744
DAS substantially outperforms brute-force search on MoNLI across all models.
NPI mechanism in pythia-1b moves negation feature through complementiser 'that', auxiliary verb, and main verb across layers before predicting NPI 'any'finding0.744
Mechanistic finding from CausalGym case study showing multi-step information movement in NPI mechanism