community
active
leiden_hybrid_concepts
label: haiku
community:leiden_hybrid_concepts-run4-c7-c4Concept entanglement in biomedical foundation models
Investigates inseparability of clinical concepts (age, pathology) in EEG transformers using SAE feature analysis and steering metrics across SleepFM, REVE, LaBraM architectures.
7 members. Each node is clickable.
Loading graph…
Drawn from 1 source
The papers/notes whose extracted claims & findings make up this cluster.
Bridges (5)
Other communities that share members with this one — cross-cutting threads or papers that sit at the seam between two themes.
Findings (4)
- A single hyperparameter procedure driven by the intrinsic dictionary health audit transfers robustly across SleepFM, REVE, and LaBraM.Demonstrates architecture-agnostic applicability of the SAE tuning method
- Age-pathology confounding observed: impossible to suppress one concept without corrupting the other.Empirical demonstration of entanglement between age and pathology features.
- Concept steering with target vs off-target probe area metric reveals three operational regimes (selectively steerable, encoded but entangled, non-encoded) across SleepFM, REVE, LaBraM.Result categorizing concept steerability into three distinct regimes.
- Monosemanticity and entanglement of SAE features were benchmarked for clinical taxonomy grounding across SleepFM, REVE, LaBraM.Quantitative assessment of feature quality using clinical concepts across models.
Claims (3)
- A single SAE hyperparameter procedure driven by an intrinsic dictionary health audit transfers robustly across all three EEG transformer architectures.Key methodological contribution claim about architecture-agnostic SAE tuning
- Age-pathology confounding prevents independent steering of age and pathology concepts.Interpretive assertion about clinical entanglement in the representations.
- SAE features can be grounded in clinical taxonomy (abnormality, age, sex, medication) to benchmark monosemanticity and entanglement.Claim that feature grounding enables interpretability metrics.