claim
active
claim:for-both-npi-and-filler-gap-tasks-the-model-initially-learns-to-move-information-directly-from-alternating-token-to-output-intermediate-steps-are-added-later-in-trainingFor both NPI and filler-gap tasks, the model initially learns to move information directly from alternating token to output; intermediate steps are added later in training
Mechanistic interpretation of training dynamics in case studies
Source paper
extracted_from(2024) · Aryaman Arora · Dan Jurafsky · Christopher Potts
Neighborhood — ranked by edge-count
Papers (1)
paper
Findings (2)
finding
- Mechanistic finding from CausalGym case study showing complex multi-step movement for filler-gap
- Mechanistic finding from CausalGym case study showing multi-step information movement in NPI mechanism
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Main mechanistic finding from case studies; evidence from training checkpoint analysis of pythia-1b
- Selective pressure toward convergence via task generality
- Training dynamics finding showing filler-gap takes longer to learn than NPI licensing
- Claim about current practical feasibility and efficiency of 2-way associative implementations.
- Central interpretive claim and motivation for future work
- Shows high IIA on random models depends on entity overlap; generalisation is essential for genuine interpretation
- Comparative prediction motivating future work contrasting different approaches to LLM self-knowledge
- Antra's foundational claim about how introspection arises computationally rather than from memorised text.