community
active
leiden_hybrid_concepts
label: haiku
community:leiden_hybrid_concepts-run4-c7-c10Concept steering & catastrophic model failure
Studies how targeted interventions on learned concepts can cause sudden, global collapse in neural network performance.
2 members. Each node is clickable.
Loading graph…
Drawn from 1 source
The papers/notes whose extracted claims & findings make up this cluster.
Bridges (2)
Other communities that share members with this one — cross-cutting threads or papers that sit at the seam between two themes.
Claims (1)
- Some SAE concept steering interventions act as 'wrecking balls' that collapse global model performance rather than selectively modifying target concepts.A critical failure mode identified in the paper demonstrating risk of naïve concept steering
Findings (1)
- Concept interventions on some concepts act as 'wrecking-ball' interventions, collapsing global model performance.Observation of catastrophic performance drop when steering certain concepts.