claim
active
claim:probe-based-data-attribution-effectively-reduces-harmful-behaviors-via-data-interventions

Probe-based data attribution effectively reduces harmful behaviors via data interventions

Authors' central interpretive assertion that their method meaningfully mitigates unwanted behaviors.

Source paper

extracted_from
Probe-Based Data Attribution: Surfacing and Mitigating Undesirable Behaviors in LLM Post-Training
(2026) · Frank Xiao · Santiago Aranguri

Neighborhood — ranked by edge-count

Findings (5)

finding

Communities (3)

community

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.