Curse of Dimensionality in Interpretability

As models grow larger, the latent space volume grows exponentially, making enumeration impossible without decomposition

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Interpretability Illusionconcept0.759
Cases where subspace interventions change model behaviour through parallel pathways rather than the target feature
interpretabilityconcept0.749
The capability to explain model predictions; a central theme of the paper, with disruption profiles as vehicle.
"For interpretability, I don't think we even have the right definitions."quote0.737
Ian Goodfellow quote used to illustrate the pre-paradigmatic state of interpretability research
Multi-dimensional linear and non-linear interpretability methods have not been benchmarked on CausalGymquestion0.733
Identified gap in benchmark coverage; only 1D linear methods are benchmarked
Bottom-up interpretabilityconcept0.731
An interpretability paradigm that explains computation in the model's own terms, rather than imposing top-down abstractions; VPD aims to realize this.
Circuit Interpretabilityconcept0.729
Advantage of DiffLogic CA over NCA — learned rules are pure binary logic circuits that can be visualized and analyzed
Interpretability as Natural Scienceframework0.729
Proposed paradigm for evaluating interpretability work through empirical falsifiability rather than benchmarks or user studies
Neural Network Interpretabilityconcept0.725
The field aimed at understanding what neural networks have learned; characterized as pre-paradigmatic in this paper