community
active
leiden_hybrid_concepts
label: sonnet
community:leiden_hybrid_concepts-run2-c66Attribution graphs for transformer circuits
Mechanistic tracing of information flow through attention and MLP subcomponents for pronoun prediction tasks
3 members. Each node is clickable.
Loading graph…
Drawn from 1 source
The papers/notes whose extracted claims & findings make up this cluster.
Bridges (3)
Other communities that share members with this one — cross-cutting threads or papers that sit at the seam between two themes.
Findings (3)
- Attribution graph for 'the princess lost her crown' reveals a femaleness signal pathway from 'princess' through attentionOne component of the minimal subnetwork for predicting 'her', discovered via VPD attribution graph.
- Attribution graph reveals a pathway that detects the verb 'lost' and upweights object pronounsSecond component of the subnetwork for 'her', complementing the femaleness signal.
- Attribution graph tracing information flow across parameter subcomponents for specific model predictions (e.g., 'her' vs 'his' pronoun selection)Shows how VPD-identified subnetworks can be analyzed to reveal interpretable pathways of computation (e.g., gender signal routing, syntactic role detection).