claim
active
claim:detecting-dormant-behavioral-changes-requires-evaluating-across-all-possible-contexts-which-is-infeasible-in-practice

Detecting dormant behavioral changes requires evaluating across all possible contexts, which is infeasible in practice

Practical limitation of current evaluation methods for pernicious divergence

Source paper

extracted_from
Addressing divergent representations from causal interventions on neural networks
(2025) · Satchel Grant · Simon Jerome Han · Alexa R. Tartaglini · Christopher Potts

Neighborhood — ranked by edge-count

Findings (1)

finding

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.