claim

active

claim:features-are-the-fundamental-unit-of-neural-networks-they-correspond-to-directions-and-can-be-rigorously-studied-and-understood

Features are the fundamental unit of neural networks; they correspond to directions and can be rigorously studied and understood.

First of three speculative claims forming the foundation of the circuits research agenda

Source paper

extracted_from

Zoom In: An Introduction to Circuits

(2020) · Chris Olah · Nick Cammarata · Ludwig Schubert · Gabriel Goh +2

Neighborhood — ranked by edge-count

Papers (1)

paper

Zoom In: An Introduction to Circuits
introduces

Findings (1)

finding

Curve detecting neurons found in every non-trivial vision model carefully examined
supports
Empirical basis for treating curve detectors as a canonical example of meaningful, understandable features

Frameworks (1)

framework

Schwann's Three Claims about Cells
analogous_to
Historical structural analogy for the paper's three claims; illustrates value of bold speculative articulation even when partly wrong

Claims (1)

claim

Features are connected by weights forming circuits, and these circuits can be rigorously studied and understood as meaningful algorithms.
supports
Second of three speculative claims asserting that subgraphs of neural networks are tractable and meaningful objects of study

Questions (1)

question

What kind of picture of neural networks would emerge if we treated individual neurons, even individual weights, as being worthy of serious investigation?
gates
Central motivating question for the circuits research program

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Feature (neural network)concept0.858
Scalar function of the input corresponding to a direction in the vector space of neuron activations; claimed to be the fundamental unit of neural networks
Neural networks, trained with different objectives on different data and modalities, are converging to a shared statistical model of reality in their representation spaces.quote0.809
The paper's central thesis statement, presented prominently after the abstract
Four features will be fundamental: (1) all processes rethought as morphogenetic; (2) sequences link into a net; (3) continuous evolution becomes widespread; (4) an ethical obligation to heal the land emerges.claim0.792
Vision of the emerging paradigm shift in society.
The features are often organized in geometrically-related clusters that share a semantic relationship.claim0.790
Decoder cosine similarity maps onto concept similarity.
Superposition hypothesis: neural networks represent more features than dimensions using almost-orthogonal directions.hypothesis0.785
Explanation for why dictionary learning can recover many more features than dimensions.
Neural networks show substantial alignment with biological representations in the brain, driven by shared task and data constraintsclaim0.783
Extends convergence argument to brain-machine alignment
Linear representation hypothesis: neural networks represent meaningful concepts as directions in their activation spaces.hypothesis0.783
Foundation for interpreting features as linear directions.
Ultimately, we would like to understand neural networks well enough to be able to intentionally design them.quote0.781
Vision statement in the conclusion.