Sparse circuits

A goal in mechanistic interpretability to identify sparse computational subgraphs; VPD promotes sparse parameter circuits.

Neighborhood — ranked by edge-count

paper

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Minimal Circuit Descriptionconcept0.798
Hypothesis that discretization finds minimum-size circuits equivalent to minimal algorithmic descriptions of patterns
Circuit Findingmethod0.776
Interpretability technique for identifying functional sub-circuits in neural networks, supported by pyvene
Sparse Feature Dictionaryconcept0.760
The extracted set of sparse interpretable features from model embeddings via SAEs
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models (Marks et al., 2025)concept0.760
Cited as enabling precise behavioral control through SAE features, extending the same methodological line
Sparse and smooth codingconcept0.760
Coding scheme where qualities are represented by few neurons with continuous similarity relations.
Noisy Simulation of Sparse Networksconcept0.755
Mechanism by which superposition works: small neural networks exploit sparsity to approximately simulate much larger sparse networks
Sparse Probingmethod0.754
Method from Gurnee et al. 2023 for finding feature directions including individual neuron analysis
Sparse Autoencoderframework0.751
Interpretability framework used to decompose layer-40 activations into sparse feature sets for studying emotional alignment and persistence