concept
active
concept:off-target-effectsOff-target effects
Unintended changes in model behavior when performing edits; compared between VPD editing and fine-tuning.
Neighborhood — ranked by edge-count
Papers (1)
paper
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Metric introduced to quantify steering selectivity by comparing the area of target and off-target concept probes.
- Metric for intervention effectiveness: 0 = ineffective, 1 = full flip of model output from false to true or vice versa
- An emergent ordering created by a well-arranged structure of centers of different sizes, binding them into a whole.
- Phenomenon where a moment bleeds into adjacent moments temporally; includes deja vu as reversed direction.
- Follow-up to Churchill's quote.
- The phenomenon where naive or vulnerable users perceive dialogue agents as having human-like desires and feelings, putting them at risk of emotional manipulation
- Tiny differences in initial conditions can lead to vastly divergent outcomes, explaining why living structure must be created dynamically.