concept
active
concept:inference-time-intervention-eliciting-truthful-answers-from-a-language-model-li-et-al-2023Inference-Time Intervention: Eliciting Truthful Answers from a Language Model (Li et al., 2023)
Safety intervention that relies on activation modification, which ESR might undermine
Neighborhood — ranked by edge-count
Papers (1)
paper
Concepts (1)
concept
- AI Alignment and Safetyassociated_withThe broader domain for which ESR has dual implications: resistance to adversarial manipulation vs. interference with safety interventions
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Method by Li et al. 2023a that adds static vectors to model activations at inference time to steer behavior
- RLHF paper cited as a major fine-tuning technique used in commercial dialogue agents
- Empirical gap explicitly acknowledged; experiments reportedly in progress at time of writing
- Key limitation identified: NLAs hallucinate specific details while preserving thematic accuracy; informs practical usage.
- Chain-of-thought prompting elicits reasoning in large language models (Wei et al., 2022)concept0.780Foundational paper on CoT prompting cited as basis for reasoning LLM training
- Predictive hypothesis about Contemplative Architecture approach based on Petersen et al. 2025 work
- The core motivating question of the paper, framed by Christiano et al. (2021)
- Prior active inference paper providing detailed neurophysiological implementation of belief updates