claim
active
claim:superposition-is-in-some-sense-deliberate-the-model-converts-pure-neurons-into-polysemantic-neurons-to-store-more-features-in-fewer-neuronsSuperposition is in some sense deliberate: the model converts pure neurons into polysemantic neurons to store more features in fewer neurons.
Interpretation of the cars-in-superposition circuit finding as an intentional representational strategy
Source paper
extracted_from(2020) · Chris Olah · Nick Cammarata · Ludwig Schubert · Gabriel Goh +2
Neighborhood — ranked by edge-count
Findings (1)
finding
- Circuit-level evidence that polysemantic neurons arise deliberately through superposition rather than entangled computation
Claims (1)
claim
- Mechanistic explanation for why superposition is geometrically feasible
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Superposition hypothesis: neural networks represent more features than dimensions using almost-orthogonal directions.hypothesis0.838Explanation for why dictionary learning can recover many more features than dimensions.
- Theoretical model of how neural networks encode more features than dimensions, informing linear representation work.
- Precise characterization of why polysemanticity poses a combinatorial obstacle to circuit analysis
- We hypothesize that polysemantic neurons may be resolvable by unfolding networks or training to avoid polysemanticity.hypothesis0.780Forward-looking proposal for how the polysemanticity challenge to circuits research might be overcome
- Central claim motivating DAS over prior methods.
- Proposes an evolutionary trajectory linking morphogenesis to neural cognition.
- Load-bearing theoretical claim providing the conceptual foundation for DAS.
- Specific phrases or sequences memorized via binary features in superposition, enabling narrow pattern matching despite few neurons