claim

active

claim:a-single-sae-hyperparameter-procedure-driven-by-an-intrinsic-dictionary-health-audit-transfers-robustly-across-all-three-eeg-transformer-architectures

A single SAE hyperparameter procedure driven by an intrinsic dictionary health audit transfers robustly across all three EEG transformer architectures.

Key methodological contribution claim about architecture-agnostic SAE tuning

Source paper

extracted_from

Mechanistic Interpretability of EEG Foundation Models via Sparse Autoencoders

(2026) · William Lehn-Schiøler · Magnus Ruud Kjær · Rahul Thapa · M. Pedersen +9

Neighborhood — ranked by edge-count

Findings (2)

finding

A single hyperparameter procedure driven by the intrinsic dictionary health audit transfers robustly across SleepFM, REVE, and LaBraM.
supports
Demonstrates architecture-agnostic applicability of the SAE tuning method
SAEs successfully extract sparse feature dictionaries from embeddings of SleepFM, REVE, and LaBraM EEG transformers.
supports
Foundational empirical result enabling all downstream analysis

Communities (3)

community

Manifold-aware concept steering in neural representations
members_of
Explores geometry of activation/behavior manifolds to enable selective, non-destructive concept interventions.
Concept entanglement in biomedical foundation models
members_of
Investigates inseparability of clinical concepts (age, pathology) in EEG transformers using SAE feature analysis and steering metrics across SleepFM, REVE, LaBraM architectures.
Dictionary health audit transfer
members_of
Hyperparameter procedure validated across SleepFM, REVE, and LaBraM EEG transformer architectures.

Methods (1)

method

Intrinsic Dictionary Health Audit
supports
A hyperparameter selection procedure driven by intrinsic measures of SAE dictionary quality that transfers across architectures

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Are the features extracted by SAEs from EEG transformers monosemantic or entangled?question0.777
Research question motivating the monosemanticity and entanglement benchmarking
We hypothesize that applying SAE-based mechanistic interpretability to EEG foundation models can expose representational failures and thereby improve clinical trust.hypothesis0.776
Overarching motivating hypothesis of the paper
Scaling laws analysis for SAE hyperparametersmethod0.775
Sweeping number of features and training steps to find compute-optimal SAE configurations.
The spectral decoder successfully translates latent SAE interventions into physiologically interpretable frequency signatures such as slow-wave suppression and α-band restoration.claim0.773
Key result linking abstract latent manipulations to known EEG neurophysiology
SAE features can be grounded in clinical taxonomy (abnormality, age, sex, medication) to benchmark monosemanticity and entanglement.claim0.767
Claim that feature grounding enables interpretability metrics.
SAE training loss decreases as a power law with compute budget when using compute-optimal hyperparameters.finding0.761
From scaling laws sweep.
Clinical concepts in EEG foundation models fall into three operational regimes: selectively steerable, encoded but entangled, and non-encoded.claim0.751
Interpretive claim summarizing the spectrum of concept steerability discovered.
SAE features generalize to images despite training only on text, indicating out-of-distribution robustness.claim0.748
A promising property for interpretability analysis off-distribution.