hypothesis

active

hypothesis:linear-representation-hypothesis-neural-networks-represent-meaningful-concepts-as-directions-in-their-activation-spaces

Linear representation hypothesis: neural networks represent meaningful concepts as directions in their activation spaces.

Foundation for interpreting features as linear directions.

Source paper

extracted_from

Scaling monosemanticity: Ex-tracting interpretable features from claude 3 sonnet

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Superposition hypothesis: neural networks represent more features than dimensions using almost-orthogonal directions.hypothesis0.867
Explanation for why dictionary learning can recover many more features than dimensions.
"Representation geometry is a window into the inner world of neural networks."quote0.825
The paper's concluding summary statement asserting the deep interpretive significance of representation geometry.
Neural networks, trained with different objectives on different data and modalities, are converging to a shared statistical model of reality in their representation spaces.quote0.821
The paper's central thesis statement, presented prominently after the abstract
Smolensky (1986) proposes that viewing a neural representation under a basis that is not aligned with individual neurons can reveal the interpretable distributed structure of the neural representations.quote0.817
Load-bearing theoretical claim providing the conceptual foundation for DAS.
Neural networks show substantial alignment with biological representations in the brain, driven by shared task and data constraintsclaim0.815
Extends convergence argument to brain-machine alignment
Assuming linear representations enables identifying the location of certain variables in a DNN, but many insights fail to generalise when more powerful non-linear maps are usedclaim0.815
Interpretive claim about what linear DAS results actually tell us
Neural representations carry rich geometric structure; but does that structure causally shape behavior?quote0.814
Opening sentence framing the paper's core inquiry.
Neural Representations of Location Composed of Spatially Periodic Bands (Krupic et al., 2012)concept0.812
Discovery of band cells; TEM-t also recapitulates these representations.