concept
active
concept:polysemantic-neuronPolysemantic Neuron
A neuron that responds to multiple unrelated inputs, posing a major challenge for circuit-level interpretation
Neighborhood — ranked by edge-count
Frameworks (1)
framework
- Disentangled Representationsassociated_withAdjacent ML literature on separating independent factors of variation; related to but distinct from the polysemanticity problem studied here
Concepts (2)
concept
- Superpositionassociated_withPhenomenon where models represent more features than dimensions via almost-orthogonal directions.
- Pure FeaturecontradictsA feature that responds to only a single latent variable, contrasted with polysemantic features
Findings (1)
finding
- InceptionV1 neuron 4e:55 responds to cat faces, fronts of cars, and cat legs as unrelated stimulisupportsConcrete example of polysemantic neuron demonstrating the challenge to the circuits agenda
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Neurons that respond to multiple unrelated concepts, limiting interpretability.
- Precise characterization of why polysemanticity poses a combinatorial obstacle to circuit analysis
- We hypothesize that polysemantic neurons may be resolvable by unfolding networks or training to avoid polysemanticity.hypothesis0.793Forward-looking proposal for how the polysemanticity challenge to circuits research might be overcome
- Identified gap linking polysemanticity challenge to disentangled representations literature
- Network with hidden layers capable of representing non-linearly separable functions, enabling deep model induction
- Counter-example disproving that architectural sparsity alone can prevent polysemanticity
- Periodically reinitializing dead autoencoder neurons using high-loss data points to improve feature coverage