method
active
method:circuit-weight-reading

Circuit Weight Reading

Reading a meaningful algorithm directly off of the weights linking neurons in a circuit

Neighborhood — ranked by edge-count

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Circuit Findingmethod0.756
    Interpretability technique for identifying functional sub-circuits in neural networks, supported by pyvene
  • Circuit Analysisconcept0.740
    Fine-grained approach to identifying specific network components responsible for reflection, mentioned as future direction.
  • Circuits Threadframework0.735
    An open scientific collaboration hosted on Distill slack studying the inner workings of neural networks via zoomed-in mechanistic investigation
  • Advantage of DiffLogic CA over NCA — learned rules are pure binary logic circuits that can be visualized and analyzed
  • Task weightconcept0.713
    Coefficient weighting each task loss in the MTL objective.
  • Prior mechanistic interpretability work reverse-engineering vision models (InceptionV1); the direct predecessor this paper extends to language models
  • Sparse circuitsconcept0.692
    A goal in mechanistic interpretability to identify sparse computational subgraphs; VPD promotes sparse parameter circuits.
  • Weight spaceconcept0.688
    The space of the model's parameter matrices, where VPD operations take place.