hypothesis

active

hypothesis:we-hypothesize-that-applying-sae-based-mechanistic-interpretability-to-eeg-foundation-models-can-expose-representational-failures-and-thereby-improve-clinical-trust

We hypothesize that applying SAE-based mechanistic interpretability to EEG foundation models can expose representational failures and thereby improve clinical trust.

Overarching motivating hypothesis of the paper

Source paper

extracted_from

Mechanistic Interpretability of EEG Foundation Models via Sparse Autoencoders

(2026) · William Lehn-Schiøler · Magnus Ruud Kjær · Rahul Thapa · M. Pedersen +9

Neighborhood — ranked by edge-count

Findings (2)

finding

Concept steering experiments identify three distinct operational regimes across clinical concepts in EEG foundation models.
associated_with
Main empirical finding of the concept steering analysis
Spectral decoder reveals pathological slow-wave suppression as a frequency signature of concept steering interventions in EEG foundation models.
associated_with
Links latent space manipulation to known EEG neurophysiology

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

EEG foundation models achieve state-of-the-art clinical performance yet their internal computations remain opaque, constituting a barrier to clinical trust.claim0.836
Motivating claim for the entire paper
Clinical concepts in EEG foundation models fall into three operational regimes: selectively steerable, encoded but entangled, and non-encoded.claim0.812
Interpretive claim summarizing the spectrum of concept steerability discovered.
SAE features can be grounded in clinical taxonomy (abnormality, age, sex, medication) to benchmark monosemanticity and entanglement.claim0.800
Claim that feature grounding enables interpretability metrics.
SAE-based mechanistic interpretability will be superseded by manifold-based analysis for understanding semantic concepts within 24 months.prediction0.788
Age and pathology are clinically entangled in EEG foundation model representations such that suppressing one concept inevitably corrupts the other.claim0.785
A specific representational failure with direct clinical safety implications
How are clinical concepts represented and steerable in EEG foundation models?question0.782
Core research question driving the mechanistic investigation.
A single SAE hyperparameter procedure driven by an intrinsic dictionary health audit transfers robustly across all three EEG transformer architectures.claim0.776
Key methodological contribution claim about architecture-agnostic SAE tuning
Circuits could act as an epistemic foundation for interpretability by breaking down model behavior into falsifiable statements about small subgraphs.claim0.768
Normative vision for how the circuits agenda could resolve the pre-paradigmatic state of interpretability