concept
active
concept:layer-normalization-ba-et-al-2016bLayer Normalization (Ba et al., 2016b)
Layer normalisation used in transformer and in TEM-t position encoding preprocessing.
Neighborhood — ranked by edge-count
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Shows that Burger et al.'s layer choice corresponds to a transitional phase, not a universal property.
- Task-specific peak anchoring score for structured reasoning domains.
- Training-free technique normalizing all task gradients to the maximum gradient norm magnitude
- Layer-wise geometry shows early dip, mid-layer alignment, and late standardization across tasksclaim0.726Qualitative pattern from E3.
- Representation of features spread across multiple layers, complicating dictionary learning.
- Claim that geometry-to-behavior correlates exist
- Layer-wise trajectories show early enrichment, mid-layer alignment, and late re-clustering.claim0.721Qualitative geometry pattern.
- Median layer where S(ℓ) peaks, across seeds.