finding
active
finding:das-achieves-100-iia-for-combined-negation-and-lexical-entailment-model-on-monli-at-layer-9-intervention-size-256DAS achieves 100% IIA for combined Negation and Lexical Entailment model on MoNLI at Layer 9, intervention size 256
Perfect abstraction relation between BERT and symbolic algorithm with negation and lexical entailment variables.
Source paper
extracted_from(2023) · Atticus Geiger · Zhengxuan Wu · Christopher Potts · Thomas Icard +1
Neighborhood — ranked by edge-count
Claims (2)
claim
- Central claim motivating DAS over prior methods.
- Second central claim of the paper.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- DAS achieves 100% IIA on hierarchical equality task with |N|=16, intervention size 8, Layer 1finding0.795DAS discovers a perfect alignment between the feed-forward network and the Both Equality Relations high-level model.
- In contrast to hierarchical equality, lexical entailment in BERT decomposes into representations of word identities, not a single abstract relation.
- Suggestive evidence for language-independent truth representation in LLMs
- Key asymmetry between hierarchical equality and NLI experiments; BERT stores identities rather than the abstract relation.
- Establishes generalizability of the core difficulty-boundary finding across model families.
- Corroborating result on additional task confirming main paper findings
- DAS behavioral loss achieves IIA of 0.997±0.001 on synthetic 10-class dataset training/test setsfinding0.728IIA baseline for DAS behavioral loss on synthetic dataset
- Truth-related directions reliably emerge at 60–75% of normalized layer depth in Qwen and Gemma modelsfinding0.727Experiment 1 finding localizing where truth can be causally mediated