community

active

leiden_hybrid_concepts

label: haiku

community:leiden_hybrid_concepts-run4-c0-c1-c4

Mechanistic interpretability through parameter analysis

Understanding neural network computation by examining weights, circuits, and signal routing rather than activation patterns alone.

3 members. Each node is clickable.

Loading graph…

Drawn from 1 source

The papers/notes whose extracted claims & findings make up this cluster.

Paper Summary: Interpreting Language Model Parameters3 members

Bridges (3)

Other communities that share members with this one — cross-cutting threads or papers that sit at the seam between two themes.

Claims (2)

Activation-based interpretability does not immediately explain the computations that gave rise to activations; understanding parameters is necessary for deeper insightMotivates shift from studying model activations ('thoughts') to understanding parameters ('the computations themselves').
Bottom-up interpretability explains computation in the model's own terms rather than imposing top-down abstractionsVPD is positioned as advancing a paradigm shift from top-down mechanistic interpretability (activation-based) to parameter-centric, data-driven discovery.

Findings (1)

Subnetwork for predicting 'her' vs 'his' in 'the princess lost her crown' involves femaleness signal routing via attention and syntactic role detectionDetailed case study demonstrating how VPD subnetworks can be traced to reveal multiple interpretable computational pathways for a single prediction.