claim

active

claim:superposition-is-in-some-sense-deliberate-the-model-converts-pure-neurons-into-polysemantic-neurons-to-store-more-features-in-fewer-neurons

Superposition is in some sense deliberate: the model converts pure neurons into polysemantic neurons to store more features in fewer neurons.

Interpretation of the cars-in-superposition circuit finding as an intentional representational strategy

Source paper

extracted_from

Zoom In: An Introduction to Circuits

(2020) · Chris Olah · Nick Cammarata · Ludwig Schubert · Gabriel Goh +2

Neighborhood — ranked by edge-count

Findings (1)

finding

InceptionV1 spreads car feature from a pure car detector in mixed4c across dog detector neurons in the next layer
supports
Circuit-level evidence that polysemantic neurons arise deliberately through superposition rather than entangled computation

Claims (1)

claim

Superposition exploits the geometry of high-dimensional spaces, which allow exponentially many almost-orthogonal vectors but only n strictly orthogonal ones.
extends
Mechanistic explanation for why superposition is geometrically feasible

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Superposition hypothesis: neural networks represent more features than dimensions using almost-orthogonal directions.hypothesis0.838
Explanation for why dictionary learning can recover many more features than dimensions.
Superposition in Neural Networksconcept0.782
Theoretical model of how neural networks encode more features than dimensions, informing linear representation work.
Polysemantic neurons are a major challenge for the circuits agenda, because N meanings in one neuron times M in another creates NxM effective connections that cannot be considered individually.claim0.780
Precise characterization of why polysemanticity poses a combinatorial obstacle to circuit analysis
We hypothesize that polysemantic neurons may be resolvable by unfolding networks or training to avoid polysemanticity.hypothesis0.780
Forward-looking proposal for how the polysemanticity challenge to circuits research might be overcome
DAS overcomes the localist limitation of prior causal abstraction by allowing individual neurons to play multiple roles via non-standard basesclaim0.777
Central claim motivating DAS over prior methods.
The remarkable ability of neurons to unify toward a centralized self is an evolutionary pivot of far earlier cell communication strategies that first solved problems in navigating anatomical morphospace.hypothesis0.776
Proposes an evolutionary trajectory linking morphogenesis to neural cognition.
Smolensky (1986) proposes that viewing a neural representation under a basis that is not aligned with individual neurons can reveal the interpretable distributed structure of the neural representations.quote0.772
Load-bearing theoretical claim providing the conceptual foundation for DAS.
Memorization in Superpositionconcept0.770
Specific phrases or sequences memorized via binary features in superposition, enabling narrow pattern matching despite few neurons