finding
active
finding:concept-interventions-on-some-concepts-act-as-wrecking-ball-interventions-collapsing-global-model-performance

Concept interventions on some concepts act as 'wrecking-ball' interventions, collapsing global model performance.

Observation of catastrophic performance drop when steering certain concepts.

Source paper

extracted_from
Mechanistic Interpretability of EEG Foundation Models via Sparse Autoencoders
(2026) · William Lehn-Schiøler · Magnus Ruud Kjær · Rahul Thapa · M. Pedersen +9

Neighborhood — ranked by edge-count

Claims (1)

claim

Communities (3)

community

Questions (1)

question

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.