Polysemantic Neuron

A neuron that responds to multiple unrelated inputs, posing a major challenge for circuit-level interpretation

Neighborhood — ranked by edge-count

framework

Disentangled Representations
associated_with
Adjacent ML literature on separating independent factors of variation; related to but distinct from the polysemanticity problem studied here

concept

Superposition
associated_with
Phenomenon where models represent more features than dimensions via almost-orthogonal directions.
Pure Feature
contradicts
A feature that responds to only a single latent variable, contrasted with polysemantic features

finding

InceptionV1 neuron 4e:55 responds to cat faces, fronts of cars, and cat legs as unrelated stimuli
supports
Concrete example of polysemantic neuron demonstrating the challenge to the circuits agenda

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Polysemanticityconcept0.798
Neurons that respond to multiple unrelated concepts, limiting interpretability.
Polysemantic neurons are a major challenge for the circuits agenda, because N meanings in one neuron times M in another creates NxM effective connections that cannot be considered individually.claim0.798
Precise characterization of why polysemanticity poses a combinatorial obstacle to circuit analysis
We hypothesize that polysemantic neurons may be resolvable by unfolding networks or training to avoid polysemanticity.hypothesis0.793
Forward-looking proposal for how the polysemanticity challenge to circuits research might be overcome
No established method for resolving polysemantic neurons into pure features at scalequestion0.780
Identified gap linking polysemanticity challenge to disentangled representations literature
Multilayer Perceptronframework0.747
Multi Layer Perceptronframework0.741
Network with hidden layers capable of representing non-linearly separable functions, enabling deep model induction
Models with 1-hot activation sparsity still have polysemantic neurons; single neuron trained on 4 mutually exclusive features prefers polysemantic representation with loss ~0.7 vs 0.8finding0.736
Counter-example disproving that architectural sparsity alone can prevent polysemanticity
Neuron Resamplingmethod0.731
Periodically reinitializing dead autoencoder neurons using high-loss data points to improve feature coverage