finding
active
finding:in-the-analyzed-two-layer-model-second-layer-attention-head-terms-dominate-the-loss-reduction-compared-to-first-layer-terms-and-the-direct-path

In the analyzed two-layer model, second-layer attention head terms dominate the loss reduction compared to first-layer terms and the direct path

Result from term importance analysis breaking down loss contribution by layer

Source paper

extracted_from
A Mathematical Framework for Transformer Circuits
(2021) ·

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.