claim
active
claim:for-both-npi-and-filler-gap-tasks-the-model-initially-learns-to-move-information-directly-from-alternating-token-to-output-intermediate-steps-are-added-later-in-training

For both NPI and filler-gap tasks, the model initially learns to move information directly from alternating token to output; intermediate steps are added later in training

Mechanistic interpretation of training dynamics in case studies

Source paper

extracted_from
CausalGym: Benchmarking causal interpretability methods on linguistic tasks
(2024) · Aryaman Arora · Dan Jurafsky · Christopher Potts

Neighborhood — ranked by edge-count

Findings (2)

finding

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.