framework
active
framework:interpretability-as-natural-scienceInterpretability as Natural Science
Proposed paradigm for evaluating interpretability work through empirical falsifiability rather than benchmarks or user studies
Neighborhood — ranked by edge-count
Papers (1)
paper
- Zoom In: An Introduction to Circuitsintroduces
Concepts (1)
concept
- The field aimed at understanding what neural networks have learned; characterized as pre-paradigmatic in this paper
Claims (1)
claim
- Argument that circuits methodology meets natural-science standards of falsifiability
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The capability to explain model predictions; a central theme of the paper, with disruption profiles as vehicle.
- Method using large language models (Claude) to generate and test explanations of features at scale
- Cases where subspace interventions change model behaviour through parallel pathways rather than the target feature
- Diagnosis of the state of the interpretability field, drawing on Kuhn's framework
- Advantage of DiffLogic CA over NCA — learned rules are pure binary logic circuits that can be visualized and analyzed
- Ian Goodfellow quote used to illustrate the pre-paradigmatic state of interpretability research
- The historical/hermeneutic approach adopted by the paper to analyze cybernetic diagrams in light of Flusser’s philosophy.
- An interpretability paradigm that explains computation in the model's own terms, rather than imposing top-down abstractions; VPD aims to realize this.