concept
active
concept:shrinkage-l1-penalty-underestimationShrinkage (L1 penalty underestimation)
Systematic underestimation of non-zero feature activations due to L1 sparsity penalty.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Systematic underestimation of feature activations degrades reconstruction and interpretability.
- The objective function combining L2 reconstruction error and L1 penalty scaled by decoder norm, used to train the SAE.
- Type II error about cognition leads to missed opportunities for top-down control (e.g., training instead of rewiring).
- Little evidence of steganography in NLAs; meaning-preserving transformations cause only small drops in FVEfinding0.697Quantitative evaluation showing NLAs do not heavily rely on covert encoding beyond overt language.
- Localization result from patching experiments; identifies group (b) hidden states as the locus of truth representations
- Cancer is interpreted as cells reverting to unicellular selfishness due to loss of gap junctional coupling, shrinking their cognitive boundary.
- Extrapolation of scaling predictive models to AGI.
- Shows absence of abstract truth representations in smallest model, supporting scale-dependent emergence claim