"You can literally read meaningful algorithms off of the weights."

Load-bearing claim about the tractability of circuit analysis; central thesis of Claim 2

Source paper

extracted_from

Zoom In: An Introduction to Circuits

(2020) · Chris Olah · Nick Cammarata · Ludwig Schubert · Gabriel Goh +2

Neighborhood — ranked by edge-count

Claims (1)

claim

Individual floating-point number weights in neural networks become meaningful once you understand the features they connect.
supports
Interpretive claim that circuits render raw weights interpretable as algorithmic structures

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Features are connected by weights forming circuits, and these circuits can be rigorously studied and understood as meaningful algorithms.claim0.777
Second of three speculative claims asserting that subgraphs of neural networks are tractable and meaningful objects of study
"It is certainly very hard, and perhaps impossible, for mere humans to anticipate and rule out in advance all the disastrous ways the machine could choose to achieve a specified objective."quote0.753
Russell's statement opening Section 2 articulating the core motivation for the Contemplative AI approach
For a given task, the number of all sequences which work is tiny by comparison with the huge number of all possible sequences; less than a trillionth of all 6 × 10^23 possible sequences actually work well enough.claim0.751
A combinatorial argument that good sequences are astronomically rare, emphasizing the difficulty of discovery.
Can an interpretable symbolic algorithm be used to faithfully explain a complex neural network model?question0.747
Framing question for the paper's research program.
The affordances of software are often abstract: actions have direct effects that users cannot see and indirect effects they cannot anticipate.claim0.747
Why concepts are needed to make sense of complex systems.
The sense-making process of memory interpretation is creative as much as it is driven by information processing.claim0.745
Emphasis on creativity in reconstruction of memory.
We evaluate the relative life of a process by making predictions based on thought-experiments or simulations.claim0.745
"the 'code' these algorithms are implemented by is 'neural code', which is written, somehow, as enormous, inscrutable matrices of parameters"quote0.744
Load-bearing framing of the core interpretability problem: neural networks encode algorithms in parameter matrices rather than human-readable code.