concept
active
concept:untied-decoder-weightsUntied Decoder Weights
Autoencoder design choice to learn separate encoder and decoder weights, increasing representational capacity by allowing encoder vectors to distinguish similar features
Neighborhood — ranked by edge-count
Frameworks (1)
framework
- Sparse Autoencoder for Dictionary Learningassociated_withPrimary method introduced: trains a one-hidden-layer MLP with L1 sparsity penalty to decompose model activations into overcomplete feature dictionaries
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Logit weight contributions from a feature that arise due to superposition with other features, not from the feature's own causal role
- Adjacent ML literature on separating independent factors of variation; related to but distinct from the polysemanticity problem studied here
- Baseline MTL approach minimizing sum of task losses with equal weights; suffers from task balancing
- Loss balancing using homoscedastic uncertainty.
- Identifying related features by cosine distance in SAE decoder space.
- Related research agenda seeking representations that separate conceptually distinct factors; contrasted with superposition approach
- Load-bearing claim about the tractability of circuit analysis; central thesis of Claim 2
- Correlative technique measuring the type of information encoded in distributed representations via linear predictability.