hypothesis
active
hypothesis:we-hypothesize-that-applying-sae-based-mechanistic-interpretability-to-eeg-foundation-models-can-expose-representational-failures-and-thereby-improve-clinical-trustWe hypothesize that applying SAE-based mechanistic interpretability to EEG foundation models can expose representational failures and thereby improve clinical trust.
Overarching motivating hypothesis of the paper
Source paper
extracted_from(2026) · William Lehn-Schiøler · Magnus Ruud Kjær · Rahul Thapa · M. Pedersen +9
Neighborhood — ranked by edge-count
Findings (2)
finding
- Concept steering experiments identify three distinct operational regimes across clinical concepts in EEG foundation models.associated_withMain empirical finding of the concept steering analysis
- Links latent space manipulation to known EEG neurophysiology
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Motivating claim for the entire paper
- Interpretive claim summarizing the spectrum of concept steerability discovered.
- Claim that feature grounding enables interpretability metrics.
- A specific representational failure with direct clinical safety implications
- Core research question driving the mechanistic investigation.
- Key methodological contribution claim about architecture-agnostic SAE tuning
- Normative vision for how the circuits agenda could resolve the pre-paradigmatic state of interpretability