concept
active
concept:polysemanticity

Polysemanticity

Neurons that respond to multiple unrelated concepts, limiting interpretability.

Neighborhood — ranked by edge-count

Frameworks (1)

framework
  • Core theoretical framework: neural networks represent more features than neurons by encoding features as directions in superposition

Claims (1)

claim

Concepts (2)

concept
  • monosemanticity
    associated_withrelated_to
    Interpretability property where a latent feature represents a single semantic concept; benchmarked across architectures.
  • Privileged Basis
    associated_with
    A property of activations where neural network features align with basis dimensions due to sparse activation functions; absent in the residual stream but present in tokens, attention patterns, and MLP activations

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.