paper:doi-10-18653-v1-2020-blackboxnlp-1-16Neural natural language inference models partially embed theories of lexical entailment and negation
Original abstract (expand)
We address whether neural models for Natural Language Inference (NLI) can learn the compositional interactions between lexical entailment and negation, using four methods: the behavioral evaluation methods of (1) challenge test sets and (2) systematic generalization tasks, and the structural evaluation methods of (3) probes and (4) interventions. To facilitate this holistic evaluation, we present Monotonicity NLI (MoNLI), a new naturalistic dataset focused on lexical entailment and negation. In our behavioral evaluations, we find that models trained on general-purpose NLI datasets fail systematically on MoNLI examples containing negation, but that MoNLI fine-tuning addresses this failure. In our structural evaluations, we look for evidence that our top-performing BERT-based model has learned to implement the monotonicity algorithm behind MoNLI. Probes yield evidence consistent with this conclusion, and our intervention experiments bolster this, showing that the causal dynamics of the model mirror the causal dynamics of this algorithm on subsets of MoNLI. This suggests that the BERT model at least partially embeds a theory of lexical entailment and negation at an algorithmic level.
Related work— refs + corpus + external arXiv
Cited / in-corpus / arXiv badges show which signals surfaced each row. Multi-source rows weighted higher.
- Neural representation in active inference: using generative models to interact with -- and understand -- the lived worldLeo D'Amato, Francesco Mannella, Matteo Priorelli, Toon Van de Maele, Ivilin Peev Stoianov, Karl Friston Giovanni Pezzulo2023≈ 77%
- ≈ 77%
- Linguistic Frameworks Go Toe-to-Toe at Neuro-Symbolic Language ModelingNathan Schneider, Lingpeng Kong Jakob Prange2026≈ 76%
- ≈ 76%
- ≈ 76%
- Evaluating Neural Language Models as Cognitive Models of Language AcquisitionAnnika Lea Heuser, Charles Yang, Jordan Kodner H\'ector Javier V\'azquez Mart\'inez2026≈ 75%
- Understanding Epistemic Language with a Language-augmented Bayesian Theory of MindTan Zhi-Xuan, Lionel Wong, Vikash Mansinghka, Joshua B. Tenenbaum Lance Ying2025≈ 75%
- Neural Theory-of-Mind? On the Limits of Social Intelligence in Large LMsRonan LeBras, Daniel Fried, Yejin Choi Maarten Sap2023≈ 75%
- Perceptions to Beliefs: Exploring Precursory Inferences for Theory of Mind in Large Language ModelsDongkwan Kim, Jiho Jin, Jiseon Kim, Yeon Seonwoo, Yejin Choi, Alice Oh, Hyunwoo Kim Chani Jung2024≈ 74%
- ≈ 74%
- Modeling Human Behavior Part I -- Learning and Belief ApproachesAndrew Fuchs and Andrea Passarella and Marco Conti2022≈ 74%
- Neural dynamics under active inference: plausibility and efficiency of information processingThomas Parr, Biswa Sengupta, Karl Friston Lancelot Da Costa2021≈ 74%
- Simulating Biological Intelligence: Active Inference with Experiment-Informed Generative ModelMoein Khajehnejad, Forough Habibollahi, Brett J. Kagan, Adeel Razi Aswin Paul2025≈ 74%
- ≈ 74%
- Convergent Representations of Linguistic Constructions in Human and Artificial Neural SystemsThomas Kinfe, Andreas Maier, Achim Schilling, Patrick Krauss Pegah Ramezani2026≈ 74%
- ≈ 73%
- ≈ 73%
- Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representationsin corpus2023≈ 71%
- When Thinking LLMs Lie: Unveiling the Strategic Deception in Representations of Reasoning Modelsin corpus2025≈ 70%
- ≈ 70%
- ≈ 70%
- Active Inference, Curiosity and Insightin corpus2017≈ 70%
- Active Inference: A Process Theoryin corpus2017≈ 70%
- ≈ 70%
- The Platonic Representation Hypothesisin corpus2024≈ 69%
- Cognitive glues are shared models of relative scarcities: the economics of collective intelligencein corpus2026≈ 69%
- ≈ 69%
- Why Learning Requires Feelingin corpus2026≈ 68%
Similar preprints — Semantic Scholar
Cited by (7)
- Addressing divergent representations from causal interventions on neural networks
Causal intervention methods central to mechanistic interpretability—including activation patching, mean-difference vector patching, Sparse Autoencoders, and Distributed Alignment Search (DAS)—systemat
- Model Alignment Search
Model Alignment Search (MAS) establishes bidirectional causal similarity between neural networks by learning a per-model orthogonal rotation matrix that isolates behaviorally relevant subspaces and us
- pyvene: A Library for Understanding and Improving PyTorch Models via Interventions
pyvene is an open-source Python library that unifies intervention-based research on PyTorch neural models by treating the intervention itself—rather than model surgery code—as the primitive abstractio
- The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets
At sufficient scale, LLMs linearly represent the truth or falsehood of factual statements in their internal activations — a claim supported by PCA visualizations, cross-dataset probe transfer, and cau
- Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations
Distributed alignment search (DAS) resolves two blocking limitations of prior causal abstraction work—brute-force alignment search and the localist assumption that high-level variables map to disjoint
- Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior
Manifold steering — intervening on model activations along paths constrained to lie on a learned activation manifold M_h rather than along Euclidean linear directions — produces behavioral trajectorie
- Arithmetic in the Wild: Llama uses Base-10 Addition to Reason About Cyclic Concepts
Llama-3.1-8B solves cyclic arithmetic (e.g., "what month is six months after August?") not by performing modular addition in the period of the cyclic concept (12 for months, 7 for days of the week) as