claim
active
claim:interpretability-today-is-a-pre-paradigmatic-field-lacking-consensus-on-objects-of-study-methods-and-evaluative-standardsInterpretability today is a pre-paradigmatic field lacking consensus on objects of study, methods, and evaluative standards.
Diagnosis of the state of the interpretability field, drawing on Kuhn's framework
Source paper
extracted_from(2020) · Chris Olah · Nick Cammarata · Ludwig Schubert · Gabriel Goh +2
Neighborhood — ranked by edge-count
Quotes (1)
quote
- Ian Goodfellow quote used to illustrate the pre-paradigmatic state of interpretability research
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The capability to explain model predictions; a central theme of the paper, with disruption profiles as vehicle.
- Motivation for VPD's parameter-focused approach.
- Proposed paradigm for evaluating interpretability work through empirical falsifiability rather than benchmarks or user studies
- Method using large language models (Claude) to generate and test explanations of features at scale
- Contrasts with auto-parallelizing compilers; flexibility of Linda.