Polysemanticity

Neurons that respond to multiple unrelated concepts, limiting interpretability.

Neighborhood — ranked by edge-count

paper

framework

Superposition Hypothesis
associated_with
Core theoretical framework: neural networks represent more features than neurons by encoding features as directions in superposition

claim

concept

monosemanticity
associated_withrelated_to
Interpretability property where a latent feature represents a single semantic concept; benchmarked across architectures.
Privileged Basis
associated_with
A property of activations where neural network features align with basis dimensions due to sparse activation functions; absent in the residual stream but present in tokens, attention patterns, and MLP activations

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Polysemantic Neuronconcept0.798
A neuron that responds to multiple unrelated inputs, posing a major challenge for circuit-level interpretation
Monosemantic Functional Featuresconcept0.746
Features that correspond to a single semantic concept and are effective for steering behavior.
Monosemanticity and entanglement of SAE features were benchmarked for clinical taxonomy grounding across SleepFM, REVE, LaBraM.finding0.743
Quantitative assessment of feature quality using clinical concepts across models.
Modularityconcept0.728
Property of developmental systems where functions are encapsulated in modules with simple triggers, enhancing evolvability.
Towards Monosemanticity: Decomposing Language Models with Dictionary Learning (Bricken et al., 2023)concept0.723
Foundational SAE mechanistic interpretability paper
No established method for resolving polysemantic neurons into pure features at scalequestion0.717
Identified gap linking polysemanticity challenge to disentangled representations literature
Particle Plasticityconcept0.717
Physical plasticity of individual cells or particles enabling adaptation to novel environments.
Polycomputationconcept0.715
Biological architecture where multiple competing/cooperating multi-scale agents develop interpretations of molecular and biophysical states rather than committing to single interpretation.