Concept entanglement in biomedical foundation models

Investigates inseparability of clinical concepts (age, pathology) in EEG transformers using SAE feature analysis and steering metrics across SleepFM, REVE, LaBraM architectures.

7 members. Each node is clickable.

Loading graph…

Drawn from 1 source

The papers/notes whose extracted claims & findings make up this cluster.

Mechanistic Interpretability of EEG Foundation Models via Sparse Autoencoders7 members

Bridges (5)

Other communities that share members with this one — cross-cutting threads or papers that sit at the seam between two themes.

Manifold-aware concept steering in neural representations7 shared
Dictionary health audit transfer2 shared
SAE Feature Geometry in Biomedical Signals2 shared
Age-pathology concept entanglement2 shared
Geometric concept representations in neural networks1 shared

Findings (4)

A single hyperparameter procedure driven by the intrinsic dictionary health audit transfers robustly across SleepFM, REVE, and LaBraM.Demonstrates architecture-agnostic applicability of the SAE tuning method
Age-pathology confounding observed: impossible to suppress one concept without corrupting the other.Empirical demonstration of entanglement between age and pathology features.
Concept steering with target vs off-target probe area metric reveals three operational regimes (selectively steerable, encoded but entangled, non-encoded) across SleepFM, REVE, LaBraM.Result categorizing concept steerability into three distinct regimes.
Monosemanticity and entanglement of SAE features were benchmarked for clinical taxonomy grounding across SleepFM, REVE, LaBraM.Quantitative assessment of feature quality using clinical concepts across models.

Claims (3)

A single SAE hyperparameter procedure driven by an intrinsic dictionary health audit transfers robustly across all three EEG transformer architectures.Key methodological contribution claim about architecture-agnostic SAE tuning
Age-pathology confounding prevents independent steering of age and pathology concepts.Interpretive assertion about clinical entanglement in the representations.
SAE features can be grounded in clinical taxonomy (abnormality, age, sex, medication) to benchmark monosemanticity and entanglement.Claim that feature grounding enables interpretability metrics.