monosemanticity

Interpretability property where a latent feature represents a single semantic concept; benchmarked across architectures.

Neighborhood — ranked by edge-count

framework

TopK Sparse Autoencoders
implements
The central mechanistic interpretability tool applied across all three EEG transformers to extract sparse feature dictionaries

concept

Polysemanticity
associated_withrelated_to
Neurons that respond to multiple unrelated concepts, limiting interpretability.
Clinical Taxonomy
associated_with
The grounding schema comprising abnormality, age, sex, and medication used to interpret SAE features
Feature splitting
associated_with
Phenomenon where a feature in a small SAE splits into multiple finer features in a larger SAE.

event

Mechanistic Interpretability of EEG Foundation Models via Sparse Autoencoders (2026)
mentions
Preprint applying TopK SAEs to three EEG transformers to reveal sparse feature dictionaries, steering regimes, and spectral interpretation.

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Monosemantic Functional Featuresconcept0.844
Features that correspond to a single semantic concept and are effective for steering behavior.
Towards Monosemanticity: Decomposing Language Models with Dictionary Learning (Bricken et al., 2023)concept0.803
Foundational SAE mechanistic interpretability paper
Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet (Templeton et al., 2024)concept0.799
Key paper on scaling SAE-based interpretability to frontier models, cited as precedent
Monosemanticity and entanglement of SAE features were benchmarked for clinical taxonomy grounding across SleepFM, REVE, LaBraM.finding0.790
Quantitative assessment of feature quality using clinical concepts across models.
Monoidconcept0.760
Standard algebraic abstraction with identity (ε) and associative binary operation ('); used to specify Image overlay operations.
nondeterminismconcept0.747
Inherent in Linda because an in statement chooses one matching tuple arbitrarily; essential for many parallel patterns.
Modularityconcept0.733
Property of developmental systems where functions are encapsulated in modules with simple triggers, enhancing evolvability.
Incommensurabilityconcept0.731
Kuhn's concept: the inability of ideas from one paradigm to be translated into the terms of another, causing communication breakdowns.