method
active
method:circuit-findingCircuit Finding
Interpretability technique for identifying functional sub-circuits in neural networks, supported by pyvene
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Fine-grained approach to identifying specific network components responsible for reflection, mentioned as future direction.
- An open scientific collaboration hosted on Distill slack studying the inner workings of neural networks via zoomed-in mechanistic investigation
- Mechanistic interpretability framework for understanding neural network computation as circuits of features
- A goal in mechanistic interpretability to identify sparse computational subgraphs; VPD promotes sparse parameter circuits.
- Advantage of DiffLogic CA over NCA — learned rules are pure binary logic circuits that can be visualized and analyzed
- Interactive tool for visualizing and inspecting learned binary logic circuits using modified DigitalJS library
- A recurring, abstract pattern found in circuits (e.g., equivariance, unioning over cases), inspired by circuit motifs in systems biology
- Reading a meaningful algorithm directly off of the weights linking neurons in a circuit