claim
active
claim:features-are-the-fundamental-unit-of-neural-networks-they-correspond-to-directions-and-can-be-rigorously-studied-and-understoodFeatures are the fundamental unit of neural networks; they correspond to directions and can be rigorously studied and understood.
First of three speculative claims forming the foundation of the circuits research agenda
Source paper
extracted_from(2020) · Chris Olah · Nick Cammarata · Ludwig Schubert · Gabriel Goh +2
Neighborhood — ranked by edge-count
Papers (1)
paper
- Zoom In: An Introduction to Circuitsintroduces
Findings (1)
finding
- Empirical basis for treating curve detectors as a canonical example of meaningful, understandable features
Frameworks (1)
framework
- Schwann's Three Claims about Cellsanalogous_toHistorical structural analogy for the paper's three claims; illustrates value of bold speculative articulation even when partly wrong
Claims (1)
claim
- Second of three speculative claims asserting that subgraphs of neural networks are tractable and meaningful objects of study
Questions (1)
question
- Central motivating question for the circuits research program
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Scalar function of the input corresponding to a direction in the vector space of neuron activations; claimed to be the fundamental unit of neural networks
- The paper's central thesis statement, presented prominently after the abstract
- Vision of the emerging paradigm shift in society.
- Decoder cosine similarity maps onto concept similarity.
- Superposition hypothesis: neural networks represent more features than dimensions using almost-orthogonal directions.hypothesis0.785Explanation for why dictionary learning can recover many more features than dimensions.
- Extends convergence argument to brain-machine alignment
- Linear representation hypothesis: neural networks represent meaningful concepts as directions in their activation spaces.hypothesis0.783Foundation for interpreting features as linear directions.
- Vision statement in the conclusion.