finding
active
finding:probe-based-method-is-approximately-10-cheaper-than-gradient-based-alternatives-30-vs-320-once-trained

Probe-based method is approximately 10× cheaper than gradient-based alternatives ($30 vs $320 once trained)

Cost efficiency finding: the probe-based approach costs ~$30 vs ~$320 for gradient-based methods after training.

Source paper

extracted_from
Probe-Based Data Attribution: Surfacing and Mitigating Undesirable Behaviors in LLM Post-Training
(2026) · Frank Xiao · Santiago Aranguri

Neighborhood — ranked by edge-count

Claims (1)

claim

Communities (3)

community

Methods (1)

method
  • Linear classifier approach applied to model activations to identify which training datapoints caused undesired behaviors in post-training.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.