question
active
question:are-the-features-extracted-by-saes-from-eeg-transformers-monosemantic-or-entangledAre the features extracted by SAEs from EEG transformers monosemantic or entangled?
Research question motivating the monosemanticity and entanglement benchmarking
Source paper
extracted_from(2026) · William Lehn-Schiøler · Magnus Ruud Kjær · Rahul Thapa · M. Pedersen +9
Neighborhood — ranked by edge-count
Claims (1)
claim
- A specific representational failure with direct clinical safety implications
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Foundational empirical result enabling all downstream analysis
- Quantitative assessment of feature quality using clinical concepts across models.
- Claim that feature grounding enables interpretability metrics.
- Key methodological contribution claim about architecture-agnostic SAE tuning
- Automated interpretability and specificity ratings show SAE features are clearer than MLP neurons.
- Central claim of the paper, supported by detailed feature analysis, human evaluation, automated interpretability of activations, and automated interpretability of logit weights
- Overarching motivating hypothesis of the paper
- The internal representations of EEG transformers from which SAE features are extracted