Circuit Interpretability

Advantage of DiffLogic CA over NCA — learned rules are pure binary logic circuits that can be visualized and analyzed

Neighborhood — ranked by edge-count

paper

artifact

Interactive Checkerboard Circuit Visualization
about
Web-based interactive visualization of the pruned checkerboard generation logic circuit
Interactive Game of Life Circuit Visualization
about
Web-based interactive visualization of the complete learned Game of Life logic circuit

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

interpretabilityconcept0.845
The capability to explain model predictions; a central theme of the paper, with disruption profiles as vehicle.
Automated Interpretabilityframework0.809
Method using large language models (Claude) to generate and test explanations of features at scale
Circuit Analysisconcept0.809
Fine-grained approach to identifying specific network components responsible for reflection, mentioned as future direction.
Neural Network Interpretabilityconcept0.802
The field aimed at understanding what neural networks have learned; characterized as pre-paradigmatic in this paper
Circuits could act as an epistemic foundation for interpretability by breaking down model behavior into falsifiable statements about small subgraphs.claim0.796
Normative vision for how the circuits agenda could resolve the pre-paradigmatic state of interpretability
Interpretability Illusionconcept0.790
Cases where subspace interventions change model behaviour through parallel pathways rather than the target feature
Interpretability as Natural Scienceframework0.789
Proposed paradigm for evaluating interpretability work through empirical falsifiability rather than benchmarks or user studies
Interactive Circuit Visualizationmethod0.780
Interactive tool for visualizing and inspecting learned binary logic circuits using modified DigitalJS library