finding
active
finding:das-achieves-overall-odds-ratio-of-10-24-on-pythia-410m-averaged-across-all-causalgym-tasksDAS achieves overall odds-ratio of 10.24 on pythia-410m averaged across all CausalGym tasks
Numerical result for pythia-410m
Source paper
extracted_from(2024) · Aryaman Arora · Dan Jurafsky · Christopher Potts
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- DAS consistently finds the most causally-efficacious features across all pythia model sizes in CausalGymfinding0.833Main benchmark result showing DAS superiority over probing, diff-in-means, PCA, k-means, LDA, and random
- DAS learning rate of 5e-3 outperforms 1e-3 (used in Wu et al. 2023) for small training sets in CausalGymfinding0.790Hyperparameter tuning result for DAS; different from prior work due to smaller training set size
- Surprising negative result for LDA despite being a supervised method
- Task accuracy on CausalGym increases consistently with model scale from 0.62 (14M) to 0.89 (6.9B)finding0.760Scaling result showing larger pythia models perform better on CausalGym linguistic tasks
- Probe achieves selectivity of 4.20 on pythia-410m, slightly exceeding DAS selectivity of 3.96finding0.759Key result showing that for models larger than pythia-70m, probe selectivity matches or exceeds DAS selectivity
- DAS runtime is invariant with number of testing hypotheses, unlike brute-force search.
- MDS achieves global win proportion of 89.5% on SJTs across 14 LLMs and four injection stridesfinding0.751MDS dominates in open-ended generation by global win proportion metric (Table 2)
- Baseline accuracy showing small models fail on harder NPI licensing tasks