quote
active
quote:for-interpretability-i-don-t-think-we-even-have-the-right-definitions"For interpretability, I don't think we even have the right definitions."
Ian Goodfellow quote used to illustrate the pre-paradigmatic state of interpretability research
Source paper
extracted_from(2020) · Chris Olah · Nick Cammarata · Ludwig Schubert · Gabriel Goh +2
Neighborhood — ranked by edge-count
Claims (1)
claim
- Diagnosis of the state of the interpretability field, drawing on Kuhn's framework
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The capability to explain model predictions; a central theme of the paper, with disruption profiles as vehicle.
- CIMC's extension of Feynman's dictum articulating the ethical imperative alongside the epistemological one
- Russell's statement opening Section 2 articulating the core motivation for the Contemplative AI approach
- Proposed paradigm for evaluating interpretability work through empirical falsifiability rather than benchmarks or user studies
- Load-bearing epistemological statement; Schrödinger argues that current ignorance does not imply impossibility—motivates search for deeper theory.
- Verbatim output under deception feature amplification illustrating recursive self-negation under amplification
- Cases where subspace interventions change model behaviour through parallel pathways rather than the target feature
- Method using large language models (Claude) to generate and test explanations of features at scale