finding
active
finding:an-intervention-benign-at-context-v4-0-75-produces-a-class-c-behavioral-flip-at-0-75-v4-1-demonstrating-dormant-behavioral-changes-from-latent-divergenceAn intervention benign at context v4<0.75 produces a class-C behavioral flip at 0.75<v4<1, demonstrating dormant behavioral changes from latent divergence
Synthetic example showing an intervention that appears safe in tested contexts but causes behavior changes in others
Source paper
extracted_from(2025) · Satchel Grant · Simon Jerome Han · Alexa R. Tartaglini · Christopher Potts
Neighborhood — ranked by edge-count
Claims (1)
claim
- Practical limitation of current evaluation methods for pernicious divergence
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Illustrates sensitivity to anchors.
- Central empirical result showing causal coupling between representation and behavior geometry across multiple substrates and modalities.
- Perturbations behaviorally null in one context but altering behavior in another due to latent divergence
- Authors' caveat that conversational context persistence rather than internal emotion state persistence could explain findings
- Cited from Wang et al. 2025a as reason SDF is preferred over demonstration fine-tuning for realistic model organisms.
- Contrasts with temporal permutation where Span Representation dominates; suggests spatio permutation reveals different dynamics.
- Cross-base fine-tuning yields asymmetric transfer: B10 transfers most robustly, B9 leastfinding0.746In-base gains accompanied by uneven OOD drops; higher-density priors more robust.
- Key theoretical claim distinguishing harmless from pernicious divergence