community
active
leiden_hybrid_concepts
label: haiku
community:leiden_hybrid_concepts-run4-c0-c1

Mechanistic interpretability via parameter decomposition

Tracing information flow through weight matrices and attention heads using attribution graphs to identify causally important subcomponents in language models.

24 members. Each node is clickable.

Loading graph…

Claims (16)

Findings (8)