claim

active

claim:polysemantic-neurons-are-a-major-challenge-for-the-circuits-agenda-because-n-meanings-in-one-neuron-times-m-in-another-creates-nxm-effective-connections-that-cannot-be-considered-individually

Polysemantic neurons are a major challenge for the circuits agenda, because N meanings in one neuron times M in another creates NxM effective connections that cannot be considered individually.

Precise characterization of why polysemanticity poses a combinatorial obstacle to circuit analysis

Source paper

extracted_from

Zoom In: An Introduction to Circuits

(2020) · Chris Olah · Nick Cammarata · Ludwig Schubert · Gabriel Goh +2

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

We hypothesize that polysemantic neurons may be resolvable by unfolding networks or training to avoid polysemanticity.hypothesis0.829
Forward-looking proposal for how the polysemanticity challenge to circuits research might be overcome
There is a many-to-many mapping between neurons and concepts, meaning multiple high-level causal variables might be encoded in overlapping groups of neuronsclaim0.804
Fundamental theoretical claim motivating DAS, attributed to Smolensky/Rumelhart/McClelland.
No established method for resolving polysemantic neurons into pure features at scalequestion0.800
Identified gap linking polysemanticity challenge to disentangled representations literature
Polysemantic Neuronconcept0.798
A neuron that responds to multiple unrelated inputs, posing a major challenge for circuit-level interpretation
Models with 1-hot activation sparsity still have polysemantic neurons; single neuron trained on 4 mutually exclusive features prefers polysemantic representation with loss ~0.7 vs 0.8finding0.792
Counter-example disproving that architectural sparsity alone can prevent polysemanticity
Superposition is in some sense deliberate: the model converts pure neurons into polysemantic neurons to store more features in fewer neurons.claim0.780
Interpretation of the cars-in-superposition circuit finding as an intentional representational strategy
The huge chemical complexity within each synapse suggests that neural models of cognition that refer only to changing weights of synaptic connections and ignore sub-neural chemistry are probably ignoring some of the most important explanatory mechanisms in brains.claim0.777
Sloman's critique of mainstream neural network theories.
Neurons can correspond to interpretable functional roles but interpretations in terms of individual neurons are unlikely to be the most parsimoniousclaim0.774
Claim from footnote 3, acknowledging neuron-level interpretability while arguing subcomponents are better.