thinker:magnus-ruud-kj-rMagnus Ruud Kjær
Authored papers (1)
Applying TopK Sparse Autoencoders (SAEs) to three architecturally distinct EEG foundation models — SleepFM, REVE, and LaBraM — reveals that clinical concepts are not cleanly separable in these models' latent spaces, with age-pathology confounding emerging as a structural failure mode rather than a tuning artifact. A single hyperparameter procedure guided by an intrinsic dictionary health audit transfers robustly across all three architectures without per-model recalibration. The paper introduces a 'target vs. off-target' probe area metric for concept steering, which operationalizes steering selectivity and exposes three distinct regimes: selectively steerable, encoded but entangled, and non-encoded. Critically, some interventions act as 'wrecking-ball' manipulations that collapse global model performance, meaning targeted suppression of a single clinical concept is impossible without corrupting the broader representation. A spectral decoder then maps latent interventions back to physiologically interpretable frequency signatures — including pathological slow-wave suppression and α-band restoration — grounding abstract latent operations in clinically recognizable EEG phenomena. Benchmarked against a clinical taxonomy spanning abnormality, age, sex, and medication, the framework quantifies monosemanticity and entanglement across architectures. The paper argues this implies that current EEG foundation models carry embedded clinical confounds that are mechanistically inseparable, posing a direct barrier to safe deployment in diagnostic settings without architectural changes that enforce disentanglement.
More papers — OpenAlex / S2
Co-authors (12)
- James Zou10 shared
- Lars Kai Hansen10 shared
- Nick Williams10 shared
- Radu Gatej10 shared
- Rahul Thapa10 shared
- Sadasivan Puthusserypady10 shared
- Sándor Beniczky10 shared
- Tue Lehn-Schiøler10 shared
- William Lehn-Schiøler10 shared
- Anton Mosquera Storgaard7 shared
- Magnus Guldberg Pedersen7 shared
- Andreas Brink-Kjær3 shared
Recent mentions (1)
- papers-typedlehn-schi-ler-2026-mechanistic.md