concept
active
concept:curse-of-dimensionality-in-interpretabilityCurse of Dimensionality in Interpretability
As models grow larger, the latent space volume grows exponentially, making enumeration impossible without decomposition
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Cases where subspace interventions change model behaviour through parallel pathways rather than the target feature
- The capability to explain model predictions; a central theme of the paper, with disruption profiles as vehicle.
- Ian Goodfellow quote used to illustrate the pre-paradigmatic state of interpretability research
- Multi-dimensional linear and non-linear interpretability methods have not been benchmarked on CausalGymquestion0.733Identified gap in benchmark coverage; only 1D linear methods are benchmarked
- An interpretability paradigm that explains computation in the model's own terms, rather than imposing top-down abstractions; VPD aims to realize this.
- Advantage of DiffLogic CA over NCA — learned rules are pure binary logic circuits that can be visualized and analyzed
- Proposed paradigm for evaluating interpretability work through empirical falsifiability rather than benchmarks or user studies
- The field aimed at understanding what neural networks have learned; characterized as pre-paradigmatic in this paper