claim

active

claim:individual-floating-point-number-weights-in-neural-networks-become-meaningful-once-you-understand-the-features-they-connect

Individual floating-point number weights in neural networks become meaningful once you understand the features they connect.

Interpretive claim that circuits render raw weights interpretable as algorithmic structures

Source paper

extracted_from

Zoom In: An Introduction to Circuits

(2020) · Chris Olah · Nick Cammarata · Ludwig Schubert · Gabriel Goh +2

Neighborhood — ranked by edge-count

Findings (1)

finding

Weights between early and full curve detectors in InceptionV1 form a curve of positive weights at tangent positions, with opposing orientations inhibitory
supports
Demonstrates that meaningful algorithms can be read directly off floating-point weights in a neural network

Quotes (1)

quote

"You can literally read meaningful algorithms off of the weights."
supports
Load-bearing claim about the tractability of circuit analysis; central thesis of Claim 2

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

What kind of picture of neural networks would emerge if we treated individual neurons, even individual weights, as being worthy of serious investigation?question0.802
Central motivating question for the circuits research program
Superposition hypothesis: neural networks represent more features than dimensions using almost-orthogonal directions.hypothesis0.787
Explanation for why dictionary learning can recover many more features than dimensions.
Learning neural networks can enable 'chunking' and rescale problem-solving to higher organizational levels, a mechanism intrinsic to transitions in individuality.hypothesis0.783
Linear representation hypothesis: neural networks represent meaningful concepts as directions in their activation spaces.hypothesis0.783
Foundation for interpreting features as linear directions.
Features are connected by weights forming circuits, and these circuits can be rigorously studied and understood as meaningful algorithms.claim0.779
Second of three speculative claims asserting that subgraphs of neural networks are tractable and meaningful objects of study
There is a growing similarity in how datapoints are represented in different neural network models, spanning different architectures, training objectives, and data modalitiesclaim0.777
Primary empirical claim of the paper
Ultimately, we would like to understand neural networks well enough to be able to intentionally design them.quote0.773
Vision statement in the conclusion.
Neural networks, trained with different objectives on different data and modalities, are converging to a shared statistical model of reality in their representation spaces.quote0.772
The paper's central thesis statement, presented prominently after the abstract