method
active
method:kl-divergence-retention-evaluationKL Divergence Retention Evaluation
Measuring KL divergence between original and post-intervention outputs on Alpaca prompts to assess behavioral preservation
Neighborhood — ranked by edge-count
Concepts (1)
concept
- Behavioral RetentionimplementsThe preservation of unrelated model capabilities after a targeted intervention, operationalized via KL divergence on Alpaca
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- A measure of the difference between two probability distributions, used extensively in free energy formulations.
- Asymmetric measure of difference between two probability distributions.
- Core phenomenon studied: when causal interventions shift internal representations away from the natural distribution
- Practical utility of reducing divergence demonstrated through regression analysis
- Control framework minimizing expected complexity; shown to be a special case of expected free energy minimization
- How can we produce a principled method for classifying harmful divergence for any mechanistic claim?question0.680Identified gap: current work lacks a general method for harmful divergence classification
- IID mass-mean probing coincides with LDA when covariance is known; used to derive the corrected probe formula
- Case study confirming that PMI-based learning in different modalities recovers the same perceptual representation