concept
active
concept:inference-time-intervention-eliciting-truthful-answers-from-a-language-model-li-et-al-2023

Inference-Time Intervention: Eliciting Truthful Answers from a Language Model (Li et al., 2023)

Safety intervention that relies on activation modification, which ESR might undermine

Neighborhood — ranked by edge-count

Concepts (1)

concept
  • The broader domain for which ESR has dual implications: resistance to adversarial manipulation vs. interference with safety interventions

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.