concept
active
concept:sparse-circuitsSparse circuits
A goal in mechanistic interpretability to identify sparse computational subgraphs; VPD promotes sparse parameter circuits.
Neighborhood — ranked by edge-count
Papers (1)
paper
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Hypothesis that discretization finds minimum-size circuits equivalent to minimal algorithmic descriptions of patterns
- Interpretability technique for identifying functional sub-circuits in neural networks, supported by pyvene
- The extracted set of sparse interpretable features from model embeddings via SAEs
- Cited as enabling precise behavioral control through SAE features, extending the same methodological line
- Coding scheme where qualities are represented by few neurons with continuous similarity relations.
- Mechanism by which superposition works: small neural networks exploit sparsity to approximately simulate much larger sparse networks
- Method from Gurnee et al. 2023 for finding feature directions including individual neuron analysis
- Interpretability framework used to decompose layer-40 activations into sparse feature sets for studying emotional alignment and persistence