Distill Circuits Thread

Prior mechanistic interpretability work reverse-engineering vision models (InceptionV1); the direct predecessor this paper extends to language models

Neighborhood — ranked by edge-count

paper

framework

Circuits Thread
related_to
An open scientific collaboration hosted on Distill slack studying the inner workings of neural networks via zoomed-in mechanistic investigation
A Mathematical Framework for Transformer Circuits
extends
Prior Anthropic paper enabling circuit-level analysis of attention-only transformers; motivates current MLP decomposition

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Circuits Frameworkframework0.752
Mechanistic interpretability framework for understanding neural network computation as circuits of features
Circuit Motifconcept0.725
A recurring, abstract pattern found in circuits (e.g., equivariance, unioning over cases), inspired by circuit motifs in systems biology
Circuit Findingmethod0.721
Interpretability technique for identifying functional sub-circuits in neural networks, supported by pyvene
Sparse circuitsconcept0.717
A goal in mechanistic interpretability to identify sparse computational subgraphs; VPD promotes sparse parameter circuits.
Recurrent Logic Circuitconcept0.709
The key novel property of DiffLogic CA — logic gate networks that are recurrent both spatially and temporally
Circuit Analysisconcept0.704
Fine-grained approach to identifying specific network components responsible for reflection, mentioned as future direction.
Circuit Weight Readingmethod0.701
Reading a meaningful algorithm directly off of the weights linking neurons in a circuit
character-circuit overlapconcept0.694
The overlap between circuits used for self-model and for modeling fictional characters; self-character is represented differently from fiction.