Neural natural language inference models partially embed theories of lexical entailment and negation

ByAtticus Geiger·Kyle Richardson·Christopher Potts

DOI 10.18653/v1/2020.blackboxnlp-1.16

Original abstract (expand)

We address whether neural models for Natural Language Inference (NLI) can learn the compositional interactions between lexical entailment and negation, using four methods: the behavioral evaluation methods of (1) challenge test sets and (2) systematic generalization tasks, and the structural evaluation methods of (3) probes and (4) interventions. To facilitate this holistic evaluation, we present Monotonicity NLI (MoNLI), a new naturalistic dataset focused on lexical entailment and negation. In our behavioral evaluations, we find that models trained on general-purpose NLI datasets fail systematically on MoNLI examples containing negation, but that MoNLI fine-tuning addresses this failure. In our structural evaluations, we look for evidence that our top-performing BERT-based model has learned to implement the monotonicity algorithm behind MoNLI. Probes yield evidence consistent with this conclusion, and our intervention experiments bolster this, showing that the causal dynamics of the model mirror the causal dynamics of this algorithm on subsets of MoNLI. This suggests that the BERT model at least partially embeds a theory of lexical entailment and negation at an algorithmic level.

Related work— refs + corpus + external arXiv

Cited / in-corpus / arXiv badges show which signals surfaced each row. Multi-source rows weighted higher.

Neural representation in active inference: using generative models to interact with -- and understand -- the lived world
Leo D'Amato, Francesco Mannella, Matteo Priorelli, Toon Van de Maele, Ivilin Peev Stoianov, Karl Friston Giovanni Pezzulo
2023
≈ 77%
A New Approach for Knowledge Generation Using Active Inference
Nazanin Movarraei Jamshid Ghasimi
2025
≈ 77%
Linguistic Frameworks Go Toe-to-Toe at Neuro-Symbolic Language Modeling
Nathan Schneider, Lingpeng Kong Jakob Prange
2026
≈ 76%
A Simple Generative Model of Logical Reasoning and Statistical Learning
Hiroyuki Kido
2026
≈ 76%
The Neural Coding Framework for Learning Generative Models
Alexander Ororbia and Daniel Kifer
2022
≈ 76%
Evaluating Neural Language Models as Cognitive Models of Language Acquisition
Annika Lea Heuser, Charles Yang, Jordan Kodner H\'ector Javier V\'azquez Mart\'inez
2026
≈ 75%
Understanding Epistemic Language with a Language-augmented Bayesian Theory of Mind
Tan Zhi-Xuan, Lionel Wong, Vikash Mansinghka, Joshua B. Tenenbaum Lance Ying
2025
≈ 75%
Neural Theory-of-Mind? On the Limits of Social Intelligence in Large LMs
Ronan LeBras, Daniel Fried, Yejin Choi Maarten Sap
2023
≈ 75%
Perceptions to Beliefs: Exploring Precursory Inferences for Theory of Mind in Large Language Models
Dongkwan Kim, Jiho Jin, Jiseon Kim, Yeon Seonwoo, Yejin Choi, Alice Oh, Hyunwoo Kim Chani Jung
2024
≈ 74%
Learning by Abstraction: The Neural State Machine
Drew A. Hudson and Christopher D. Manning
2019
≈ 74%
Modeling Human Behavior Part I -- Learning and Belief Approaches
Andrew Fuchs and Andrea Passarella and Marco Conti
2022
≈ 74%
Neural dynamics under active inference: plausibility and efficiency of information processing
Thomas Parr, Biswa Sengupta, Karl Friston Lancelot Da Costa
2021
≈ 74%
Simulating Biological Intelligence: Active Inference with Experiment-Informed Generative Model
Moein Khajehnejad, Forough Habibollahi, Brett J. Kagan, Adeel Razi Aswin Paul
2025
≈ 74%
Inference of Abstraction for a Unified Account of Symbolic Reasoning from Data
Hiroyuki Kido
2026
≈ 74%
Convergent Representations of Linguistic Constructions in Human and Artificial Neural Systems
Thomas Kinfe, Andreas Maier, Achim Schilling, Patrick Krauss Pegah Ramezani
2026
≈ 74%
Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations
in corpus
≈ 73%
A tale of two densities: active inference is enactive inference
in corpus
2020
≈ 73%
Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations
in corpus
2023
≈ 71%
When Thinking LLMs Lie: Unveiling the Strategic Deception in Representations of Reasoning Models
in corpus
2025
≈ 70%
Active inference on discrete state-spaces: a synthesis
in corpus
2020
≈ 70%
Multiple ways to implement and infer sentience
in corpus
≈ 70%
Active Inference, Curiosity and Insight
in corpus
2017
≈ 70%
Active Inference: A Process Theory
in corpus
2017
≈ 70%
Paper Summary: Interpreting Language Model Parameters
in corpus
≈ 70%
The Platonic Representation Hypothesis
in corpus
2024
≈ 69%
Cognitive glues are shared models of relative scarcities: the economics of collective intelligence
in corpus
2026
≈ 69%
A Free energy principle for the brain (lecture summary)
in corpus
2008
≈ 69%
Why Learning Requires Feeling
in corpus
2026
≈ 68%

Similar preprints — Semantic Scholar

Cited by (7)

Addressing divergent representations from causal interventions on neural networks
Causal intervention methods central to mechanistic interpretability—including activation patching, mean-difference vector patching, Sparse Autoencoders, and Distributed Alignment Search (DAS)—systemat
Model Alignment Search
Model Alignment Search (MAS) establishes bidirectional causal similarity between neural networks by learning a per-model orthogonal rotation matrix that isolates behaviorally relevant subspaces and us
pyvene: A Library for Understanding and Improving PyTorch Models via Interventions
pyvene is an open-source Python library that unifies intervention-based research on PyTorch neural models by treating the intervention itself—rather than model surgery code—as the primitive abstractio
The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets
At sufficient scale, LLMs linearly represent the truth or falsehood of factual statements in their internal activations — a claim supported by PCA visualizations, cross-dataset probe transfer, and cau
Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations
Distributed alignment search (DAS) resolves two blocking limitations of prior causal abstraction work—brute-force alignment search and the localist assumption that high-level variables map to disjoint
Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior
Manifold steering — intervening on model activations along paths constrained to lie on a learned activation manifold M_h rather than along Euclidean linear directions — produces behavioral trajectorie
Arithmetic in the Wild: Llama uses Base-10 Addition to Reason About Cyclic Concepts
Llama-3.1-8B solves cyclic arithmetic (e.g., "what month is six months after August?") not by performing modular addition in the period of the cyclic concept (12 for months, 7 for days of the week) as