Circuit Weight Reading

Reading a meaningful algorithm directly off of the weights linking neurons in a circuit

Neighborhood — ranked by edge-count

paper

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Circuit Findingmethod0.756
Interpretability technique for identifying functional sub-circuits in neural networks, supported by pyvene
Circuit Analysisconcept0.740
Fine-grained approach to identifying specific network components responsible for reflection, mentioned as future direction.
Circuits Threadframework0.735
An open scientific collaboration hosted on Distill slack studying the inner workings of neural networks via zoomed-in mechanistic investigation
Circuit Interpretabilityconcept0.731
Advantage of DiffLogic CA over NCA — learned rules are pure binary logic circuits that can be visualized and analyzed
Task weightconcept0.713
Coefficient weighting each task loss in the MTL objective.
Distill Circuits Threadframework0.701
Prior mechanistic interpretability work reverse-engineering vision models (InceptionV1); the direct predecessor this paper extends to language models
Sparse circuitsconcept0.692
A goal in mechanistic interpretability to identify sparse computational subgraphs; VPD promotes sparse parameter circuits.
Weight spaceconcept0.688
The space of the model's parameter matrices, where VPD operations take place.