community
active
leiden_hybrid_concepts
label: haiku
community:leiden_hybrid_concepts-run4-c0-c1-c4Mechanistic interpretability through parameter analysis
Understanding neural network computation by examining weights, circuits, and signal routing rather than activation patterns alone.
3 members. Each node is clickable.
Loading graph…
Drawn from 1 source
The papers/notes whose extracted claims & findings make up this cluster.
Bridges (3)
Other communities that share members with this one — cross-cutting threads or papers that sit at the seam between two themes.
Claims (2)
- Activation-based interpretability does not immediately explain the computations that gave rise to activations; understanding parameters is necessary for deeper insightMotivates shift from studying model activations ('thoughts') to understanding parameters ('the computations themselves').
- Bottom-up interpretability explains computation in the model's own terms rather than imposing top-down abstractionsVPD is positioned as advancing a paradigm shift from top-down mechanistic interpretability (activation-based) to parameter-centric, data-driven discovery.
Findings (1)
- Subnetwork for predicting 'her' vs 'his' in 'the princess lost her crown' involves femaleness signal routing via attention and syntactic role detectionDetailed case study demonstrating how VPD subnetworks can be traced to reveal multiple interpretable computational pathways for a single prediction.