Causal analysis of syntactic agreement mechanisms in neural language models

ByMatthew Finlayson·Aaron Mueller·Sebastian Gehrmann·Stuart Shieber·Tal Linzen·Yonatan Belinkov

DOI 10.18653/v1/2021.acl-long.144 arXiv 2106.06087

Original abstract (expand)

Matthew Finlayson, Aaron Mueller, Sebastian Gehrmann, Stuart Shieber, Tal Linzen, Yonatan Belinkov. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021.

Related work— refs + corpus + external arXiv

Cited / in-corpus / arXiv badges show which signals surfaced each row. Multi-source rows weighted higher.

Psychologically-Inspired Causal Prompts
Zhijing Jin, Justus Mattern, Rada Mihalcea, Mrinmaya Sachan, Bernhard Schoelkopf Zhiheng Lyu
2023
≈ 77%
Causal Drawbridges: Characterizing Gradient Blocking of Syntactic Islands in Transformer LMs
Sasha Boguraev and Kyle Mahowald
2026
≈ 77%
Evaluating Neural Language Models as Cognitive Models of Language Acquisition
Annika Lea Heuser, Charles Yang, Jordan Kodner H\'ector Javier V\'azquez Mart\'inez
2026
≈ 77%
CausalGym: Benchmarking causal interpretability methods on linguistic tasks
in corpus
2024
≈ 76%
Convergent Representations of Linguistic Constructions in Human and Artificial Neural Systems
Thomas Kinfe, Andreas Maier, Achim Schilling, Patrick Krauss Pegah Ramezani
2026
≈ 76%
Causal Interventions on Causal Paths: Mapping GPT-2's Reasoning From Syntax to Semantics
Joshua Lum, Ziyi Liu, Dani Yogatama Isabelle Lee
2024
≈ 76%
Mechanistic Interpretability for Large Language Model Alignment: Progress, Challenges, and Future Directions
Usman Naseem
2026
≈ 76%
The production of meaning in the processing of natural language
Quan Le Thien, Nayan D'Souza, Louis van der Elst Christopher J. Agostino
2026
≈ 75%
Mechanistic Interpretability of Brain-to-Speech Models Across Speech Modes
Ayushi Mishra Maryam Maghsoudi
2026
≈ 75%
Across the Levels of Analysis: Explaining Predictive Processing in Humans Requires More Than Machine-Estimated Probabilities
Sathvik Nair and Colin Phillips
2026
≈ 75%
Neural mechanisms of predictive processing: a collaborative community experiment through the OpenScope program
Nicholas Audette, Ryszard Auksztulewicz, Krzysztof Basi\'nski, Andr\'e M. Bastos, Michael Berry, Andres Canales-Johnson, Hannah Choi, Claudia Clopath, Uri Cohen, Rui Ponte Costa, Roberto De Filippo, Roman Doronin, S\'everine Durand, Steven P. Errington, Jeffrey P. Gavornik, Colleen J. Gillon, Arno Granier, Jordan P. Hamm, Loreen Hert\"ag, Henry Kennedy, Sandeep Kumar, Alexander Ladd, Hugo Ladret, J\'er\^ome A. Lecoq, Alexander Maier, Patrick McCarthy, Jie Mei, Jorge Mejias, John Hongyu Meng, Fabian Mikulasch, Noga Mudrik, Farzaneh Najafi, Kevin Nejad, Hamed Nejat, Karim Oweiss, Mihai A. Petrovici, Viola Priesemann, Lucas Rudelt, Sarah Ruediger, Simone Russo, Alessandro Salatiello, Walter Senn, Eli Sennesh, Sepehr Sima, Cem Uran, Anna Vasilevskaya, Julien Vezoli, Martin Vinck, Xiao-Jing Wang, Jacob A. Westerberg, Katharina Wilmes, Yihan Sophy Xiong Ido Aizenbud
2026
≈ 75%
Combining Causal Models for More Accurate Abstractions of Neural Networks
Sara Magliacane, Atticus Geiger Theodora-Mara P\^islar
2025
≈ 75%
Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations
in corpus
≈ 75%
Multi-Agent Language Models: Advancing Cooperation, Coordination, and Adaptation
Arjun Vaithilingam Sudhakar
2025
≈ 74%
Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations
in corpus
2023
≈ 74%
The Nonverbal Syntax Framework: An Evidence-Based Tiered System for Inferring Learner States from Observable Behavioral Cues
Mary John, Jaloliddin Rustamov, Zahiriddin Rustamov, Saja Aldabet, Nazar Zaki, Khaled Shuaib Sherzod Turaev
2026
≈ 74%
Causal Head Gating: A Framework for Interpreting Roles of Attention Heads in Transformers
Henry Conklin, Yukang Yang, Thomas Griffiths, Jonathan Cohen, Sarah-Jane Leslie Andrew Nam
2025
≈ 74%
The Same But Different: Structural Similarities and Differences in Multilingual Language Modeling
Qinan Yu, Matianyu Zang, Carsten Eickhoff, Ellie Pavlick Ruochen Zhang
2024
≈ 74%
The Causally Emergent Alignment Hypothesis: Causal Emergence Aligns with and Predicts Final Reward in Reinforcement Learning Agents
in corpus
2026
≈ 73%
Model Alignment Search
in corpus
2025
≈ 73%
The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?
in corpus
2025
≈ 72%
Addressing divergent representations from causal interventions on neural networks
in corpus
2025
≈ 72%
Emergence and Causality in Complex Systems: A Survey on Causal Emergence and Related Quantitative Studies
in corpus
2023
≈ 71%
Cognitive glues are shared models of relative scarcities: the economics of collective intelligence
in corpus
2026
≈ 71%
Why Learning Requires Feeling
in corpus
2026
≈ 70%
Multiple ways to implement and infer sentience
in corpus
≈ 70%

Similar preprints — Semantic Scholar

Cited by (3)

The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets
At sufficient scale, LLMs linearly represent the truth or falsehood of factual statements in their internal activations — a claim supported by PCA visualizations, cross-dataset probe transfer, and cau
Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations
Distributed alignment search (DAS) resolves two blocking limitations of prior causal abstraction work—brute-force alignment search and the localist assumption that high-level variables map to disjoint
Arithmetic in the Wild: Llama uses Base-10 Addition to Reason About Cyclic Concepts
Llama-3.1-8B solves cyclic arithmetic (e.g., "what month is six months after August?") not by performing modular addition in the period of the cyclic concept (12 for months, 7 for days of the week) as