concept
active
concept:differential-sensitivitydifferential sensitivity
The capacity to distinguish which of multiple sentences received injection or which received stronger injection, contrasted with binary detection
Neighborhood — ranked by edge-count
Concepts (1)
concept
- global logit shiftcontradictsThe methodological confound identified by this paper: injection biases model toward 'YES' for any binary question regardless of content
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The phenomenon where life is created or destroyed by dimensional changes as small as a tenth of an inch.
- Sensitivity as a property common to all matter or as a result of the organization of matter (Diderot's hypothesis).hypothesis0.746A simple hypothesis that explains everything, contrasted with the mechanistic view that creates mysteries.
- Deep responsiveness to local conditions, essential for a process to be living.
- Systematic modification of system prompt elements to identify which are necessary for alignment faking
- Historical example of judging scientific knowledge by ethnic origin of its creators — used as cautionary analogy for judging AI outputs by origin rather than quality
- Subtle variation and detail, as in pots of flowers, that brings life to a place.
- Equated with inference of past, present and future hidden states via minimization of variational free energy.
- Post-training alignment method during which undesirable behaviors emerged in the studied model.