community
active
leiden_hybrid_concepts
label: haiku
community:leiden_hybrid_concepts-run4-c0-c9

Probe-based data attribution for LLM safety

Cost-effective methods using probes to identify and intervene on harmful training data, achieving 63-84% behavior reduction at 10× lower cost than gradient methods.

10 members. Each node is clickable.

Loading graph…

Drawn from 3 sources

The papers/notes whose extracted claims & findings make up this cluster.

Bridges (4)

Other communities that share members with this one — cross-cutting threads or papers that sit at the seam between two themes.

Findings (7)

Claims (3)