method
pending-review
method:probe-based-data-attributionProbe-Based Data Attribution
xiao-aranguri-probe-data-attribution-2026.mdFrontmatter (10 fields)
{
"author": null,
"context": "Linear classifier approach applied to model activations to identify which training datapoints caused undesired behaviors in post-training.",
"enrichment": {
"is_stale": true
},
"norm_label": "Probe-Based Data Attribution",
"source_url": null,
"graphify_id": "probe_based_data_attribution",
"source_file": "xiao-aranguri-probe-data-attribution-2026.md",
"imported_from": "/Users/antonborzov/Documents/Research.nosync/papers/extract_typed_out/xiao-aranguri-probe-data-attribution-2026/graph.json",
"extracted_type": "method",
"graphify_file_type": "method"
}Outgoing (3)
about (1)
- Post-training alignment(concept)
Associated with (1)
- Activation space(concept)
Implements (1)
- OLMo 2 7B(dataset)
Mentions (1)
- papers-typed
xiao-aranguri-probe-data-attribution-2026.md