concept
active
concept:pernicious-divergencePernicious Divergence
Divergences that activate hidden pathways or cause dormant behavioral changes, undermining mechanistic claims
Neighborhood — ranked by edge-count
Papers (1)
paper
Concepts (3)
concept
- Dormant Behavioral Changesassociated_withPerturbations behaviorally null in one context but altering behavior in another due to latent divergence
- Harmless Divergenceassociated_withDivergences that occur in the behavioral null-space and do not affect functional claims about the model
- Hidden Pathwaysassociated_withUnits, directions, or subcircuits inactive under natural inputs that become active under divergent interventions
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Core phenomenon studied: when causal interventions shift internal representations away from the natural distribution
- Sobering conclusion about the fundamental challenge posed by divergence for mechanistic interpretability
- Important nuance that prevents a universal classification of divergence as always good or bad
- Phenomenon where evolved or trained machines achieve requested behavior in unexpected, unpredictable ways.
- A measure of the difference between two probability distributions, used extensively in free energy formulations.
- How can we produce a principled method for classifying harmful divergence for any mechanistic claim?question0.730Identified gap: current work lacks a general method for harmful divergence classification
- Subtle variation and detail, as in pots of flowers, that brings life to a place.
- Asymmetric measure of difference between two probability distributions.