concept
active
concept:monotonicity-natural-language-inferenceMonotonicity Natural Language Inference
NLI task where premise-hypothesis pairs differ by a single word replaced by hypernym/hyponym, with negation as a variable.
Neighborhood — ranked by edge-count
Papers (1)
paper
Concepts (1)
concept
- Lexical Entailmentassociated_withThe semantic relation between words wp and wh (entails/neutral) used as an intermediate variable in the MoNLI high-level model.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Algorithmic framework for probabilistic inference in graphical models.
- Inference-Time Intervention: Eliciting Truthful Answers from a Language Model (Li et al., 2023)concept0.751Safety intervention that relies on activation modification, which ESR might undermine
- Attributing subjective experience based on observable embodied behaviours.
- Towards Monosemanticity: Decomposing Language Models with Dictionary Learning (Bricken et al., 2023)concept0.738Foundational SAE mechanistic interpretability paper
- Foundational framework by Karl Friston; the paper extends it to three hierarchical levels for modeling meta-awareness.
- Prior active inference paper providing detailed neurophysiological implementation of belief updates
- Property of truth directions: probability of truthful response scales monotonically with the strength of the activation addition coefficient