finding
active
finding:concept-interventions-on-some-concepts-act-as-wrecking-ball-interventions-collapsing-global-model-performanceConcept interventions on some concepts act as 'wrecking-ball' interventions, collapsing global model performance.
Observation of catastrophic performance drop when steering certain concepts.
Source paper
extracted_from(2026) · William Lehn-Schiøler · Magnus Ruud Kjær · Rahul Thapa · M. Pedersen +9
Neighborhood — ranked by edge-count
Claims (1)
claim
- A critical failure mode identified in the paper demonstrating risk of naïve concept steering
Communities (3)
community
- Explores geometry of activation/behavior manifolds to enable selective, non-destructive concept interventions.
- Concepts encoded as curved manifolds and circular structures in LLM activation spaces.
- Studies how targeted interventions on learned concepts can cause sudden, global collapse in neural network performance.
Questions (1)
question
- Question about the feasibility of safe concept steering in EEG models.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Load-bearing phrase describing catastrophic steering effects.
- Demonstrates a critical failure mode of concept steering with clinical safety implications
- Can concept steering interventions on EEG foundation models be made selective rather than globally destructive?question0.771Research question motivating the introduction of the probe area metric and identification of operational regimes
- Type of concept steering intervention that catastrophically collapses global model performance.
- Piggybacking a new purpose onto an existing concept (overloading) causes conflicts and design flaws.claim0.743Illustrated with OS X print subsystem example.
- Motivation claim contrasting pyvene with prior tools like BauKit, TransformerLens, nnsight, graphpatch
- Additional synthetic example of pernicious divergence from balanced subspaces
- Survey of representation engineering methods cited as related work