hypothesis

active

hypothesis:superposition-hypothesis-neural-networks-represent-more-features-than-dimensions-using-almost-orthogonal-directions

Superposition hypothesis: neural networks represent more features than dimensions using almost-orthogonal directions.

Explanation for why dictionary learning can recover many more features than dimensions.

Source paper

extracted_from

Scaling monosemanticity: Ex-tracting interpretable features from claude 3 sonnet

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Linear representation hypothesis: neural networks represent meaningful concepts as directions in their activation spaces.hypothesis0.867
Foundation for interpreting features as linear directions.
Superposition is in some sense deliberate: the model converts pure neurons into polysemantic neurons to store more features in fewer neurons.claim0.838
Interpretation of the cars-in-superposition circuit finding as an intentional representational strategy
If the universality hypothesis is broadly true, it raises the exciting possibility that artificial neural networks could predict features previously unknown in biological neural networks.claim0.827
Speculative extension of universality to neuroscience, with high-low frequency detectors as a candidate prediction
Smolensky (1986) proposes that viewing a neural representation under a basis that is not aligned with individual neurons can reveal the interpretable distributed structure of the neural representations.quote0.816
Load-bearing theoretical claim providing the conceptual foundation for DAS.
Superposition exploits the geometry of high-dimensional spaces, which allow exponentially many almost-orthogonal vectors but only n strictly orthogonal ones.claim0.815
Mechanistic explanation for why superposition is geometrically feasible
Neural networks, trained with different objectives on different data and modalities, are converging to a shared statistical model of reality in their representation spaces.quote0.815
The paper's central thesis statement, presented prominently after the abstract
Neural networks show substantial alignment with biological representations in the brain, driven by shared task and data constraintsclaim0.812
Extends convergence argument to brain-machine alignment
Superposition in Neural Networksconcept0.805
Theoretical model of how neural networks encode more features than dimensions, informing linear representation work.