Circuits Thread

An open scientific collaboration hosted on Distill slack studying the inner workings of neural networks via zoomed-in mechanistic investigation

Neighborhood — ranked by edge-count

paper

thinker

Chris Olah
studies
Co-author; provided high-level research guidance, wrote introduction/discussion.

framework

Distill Circuits Thread
related_to
Prior mechanistic interpretability work reverse-engineering vision models (InceptionV1); the direct predecessor this paper extends to language models

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Circuits Frameworkframework0.837
Mechanistic interpretability framework for understanding neural network computation as circuits of features
Circuit Findingmethod0.801
Interpretability technique for identifying functional sub-circuits in neural networks, supported by pyvene
Circuit Motifconcept0.778
A recurring, abstract pattern found in circuits (e.g., equivariance, unioning over cases), inspired by circuit motifs in systems biology
Circuit Analysisconcept0.777
Fine-grained approach to identifying specific network components responsible for reflection, mentioned as future direction.
Sparse circuitsconcept0.748
A goal in mechanistic interpretability to identify sparse computational subgraphs; VPD promotes sparse parameter circuits.
Interactive Circuit Visualizationmethod0.744
Interactive tool for visualizing and inspecting learned binary logic circuits using modified DigitalJS library
Recurrent Logic Circuitconcept0.743
The key novel property of DiffLogic CA — logic gate networks that are recurrent both spatially and temporally
Circuit Weight Readingmethod0.735
Reading a meaningful algorithm directly off of the weights linking neurons in a circuit