concept
active
concept:monosemanticity

monosemanticity

Interpretability property where a latent feature represents a single semantic concept; benchmarked across architectures.

Neighborhood — ranked by edge-count

Frameworks (1)

framework
  • The central mechanistic interpretability tool applied across all three EEG transformers to extract sparse feature dictionaries

Concepts (3)

concept
  • Polysemanticity
    associated_withrelated_to
    Neurons that respond to multiple unrelated concepts, limiting interpretability.
  • Clinical Taxonomy
    associated_with
    The grounding schema comprising abnormality, age, sex, and medication used to interpret SAE features
  • Feature splitting
    associated_with
    Phenomenon where a feature in a small SAE splits into multiple finer features in a larger SAE.

Events (1)

event

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.