concept
active
concept:off-target-effects

Off-target effects

Unintended changes in model behavior when performing edits; compared between VPD editing and fine-tuning.

Neighborhood — ranked by edge-count

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Metric introduced to quantify steering selectivity by comparing the area of target and off-target concept probes.
  • Metric for intervention effectiveness: 0 = ineffective, 1 = full flip of model output from false to true or vice versa
  • Field effectconcept0.735
    An emergent ordering created by a well-arranged structure of centers of different sizes, binding them into a whole.
  • tracer effectconcept0.735
    Phenomenon where a moment bleeds into adjacent moments temporally; includes deja vu as reversed direction.
  • Follow-up to Churchill's quote.
  • Eliza Effectconcept0.720
    The phenomenon where naive or vulnerable users perceive dialogue agents as having human-like desires and feelings, putting them at risk of emotional manipulation
  • butterfly effectconcept0.709
    Tiny differences in initial conditions can lead to vastly divergent outcomes, explaining why living structure must be created dynamically.