question
active
question:no-established-method-for-resolving-polysemantic-neurons-into-pure-features-at-scaleNo established method for resolving polysemantic neurons into pure features at scale
Identified gap linking polysemanticity challenge to disentangled representations literature
Source paper
extracted_from(2020) · Chris Olah · Nick Cammarata · Ludwig Schubert · Gabriel Goh +2
Neighborhood — ranked by edge-count
Papers (1)
paper
- Zoom In: An Introduction to Circuitsassociated_with
Claims (1)
claim
- Second of three speculative claims asserting that subgraphs of neural networks are tractable and meaningful objects of study
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- We hypothesize that polysemantic neurons may be resolvable by unfolding networks or training to avoid polysemanticity.hypothesis0.827Forward-looking proposal for how the polysemanticity challenge to circuits research might be overcome
- Precise characterization of why polysemanticity poses a combinatorial obstacle to circuit analysis
- Counter-example disproving that architectural sparsity alone can prevent polysemanticity
- A neuron that responds to multiple unrelated inputs, posing a major challenge for circuit-level interpretation
- Interpretation of the cars-in-superposition circuit finding as an intentional representational strategy
- Author's conclusion after extensive investigation of architectural approaches to monosemanticity
- Central claim of the paper, supported by detailed feature analysis, human evaluation, automated interpretability of activations, and automated interpretability of logit weights
- Load-bearing theoretical claim providing the conceptual foundation for DAS.