Andreas Brink-Kjær

Co-author of the paper

Authored

Introduces

Studies

Affiliations

Cited by

Authored papers (1)

Mechanistic Interpretability of EEG Foundation Models via Sparse Autoencoders2026
Applying TopK Sparse Autoencoders (SAEs) to three architecturally distinct EEG foundation models — SleepFM, REVE, and LaBraM — reveals that clinical concepts are not cleanly separable in these models' latent spaces, with age-pathology confounding emerging as a structural failure mode rather than a tuning artifact. A single hyperparameter procedure guided by an intrinsic dictionary health audit transfers robustly across all three architectures without per-model recalibration. The paper introduces a 'target vs. off-target' probe area metric for concept steering, which operationalizes steering selectivity and exposes three distinct regimes: selectively steerable, encoded but entangled, and non-encoded. Critically, some interventions act as 'wrecking-ball' manipulations that collapse global model performance, meaning targeted suppression of a single clinical concept is impossible without corrupting the broader representation. A spectral decoder then maps latent interventions back to physiologically interpretable frequency signatures — including pathological slow-wave suppression and α-band restoration — grounding abstract latent operations in clinically recognizable EEG phenomena. Benchmarked against a clinical taxonomy spanning abnormality, age, sex, and medication, the framework quantifies monosemanticity and entanglement across architectures. The paper argues this implies that current EEG foundation models carry embedded clinical confounds that are mechanistically inseparable, posing a direct barrier to safe deployment in diagnostic settings without architectural changes that enforce disentanglement.

More papers — OpenAlex / S2

Co-authors (12)

Recent mentions (1)

papers-typed
lehn-schi-ler-2026-mechanistic.md