finding
active
finding:pythia-14m-achieves-only-0-38-accuracy-on-npi-ever-subj-relc-task

pythia-14m achieves only 0.38 accuracy on npi_ever_subj-relc task

Baseline accuracy showing small models fail on harder NPI licensing tasks

Source paper

extracted_from
CausalGym: Benchmarking causal interpretability methods on linguistic tasks
(2024) · Aryaman Arora · Dan Jurafsky · Christopher Potts

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.