claim

active

claim:the-harm-of-divergence-is-inherently-claim-dependent-the-same-divergence-can-be-harmless-for-one-mechanistic-claim-and-pernicious-for-another

The harm of divergence is inherently claim-dependent: the same divergence can be harmless for one mechanistic claim and pernicious for another

Important nuance that prevents a universal classification of divergence as always good or bad

Source paper

extracted_from

(2025) · Satchel Grant · Simon Jerome Han · Alexa R. Tartaglini · Christopher Potts

claim

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

How can we produce a principled method for classifying harmful divergence for any mechanistic claim?question0.818
Identified gap: current work lacks a general method for harmful divergence classification
No principled method exists for classifying harmful divergence for arbitrary mechanistic claimsconcept0.785
Explicitly identified limitation: the paper cannot classify perniciousness in general
When, and to what extent, is it okay for divergences to occur?question0.780
Second core research question motivating the theoretical analysis in Section 4
Harmless Divergenceconcept0.780
Divergences that occur in the behavioral null-space and do not affect functional claims about the model
Any divergence outside of the null-space of NN layers is potentially pernicious, posing challenges for a complete mechanistic understanding of NNsclaim0.767
Sobering conclusion about the fundamental challenge posed by divergence for mechanistic interpretability
Pernicious Divergenceconcept0.762
Divergences that activate hidden pathways or cause dormant behavioral changes, undermining mechanistic claims
When it is not okay, how can we prevent divergent representations from occurring?question0.760
Third core research question motivating the CL loss approach in Section 5
Minimizing divergence magnitude does not guarantee elimination of hidden pathways; it only reduces the risk surfaceclaim0.753
Important caveat to the CL loss solution, noting it is a step not a complete fix