claim
active
claim:circuits-could-act-as-an-epistemic-foundation-for-interpretability-by-breaking-down-model-behavior-into-falsifiable-statements-about-small-subgraphs

Circuits could act as an epistemic foundation for interpretability by breaking down model behavior into falsifiable statements about small subgraphs.

Normative vision for how the circuits agenda could resolve the pre-paradigmatic state of interpretability

Source paper

extracted_from
Zoom In: An Introduction to Circuits
(2020) · Chris Olah · Nick Cammarata · Ludwig Schubert · Gabriel Goh +2

Neighborhood — ranked by edge-count

Claims (1)

claim

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.