Shrinkage (L1 penalty underestimation)

Systematic underestimation of non-zero feature activations due to L1 sparsity penalty.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Shrinkage from L1 penalty significantly harms sparse autoencoder performance.claim0.826
Systematic underestimation of feature activations degrades reconstruction and interpretability.
SAE training loss (MSE + L1 penalty with decoder norm scaling)method0.750
The objective function combining L2 reconstruction error and L1 penalty scaled by decoder norm, used to train the SAE.
Under-estimating the capacity of a system for plasticity, learning, and intelligent problem-solving greatly reduces the toolkit of techniques for understanding and controlling its behavior.claim0.713
Type II error about cognition leads to missed opportunities for top-down control (e.g., training instead of rewiring).
Little evidence of steganography in NLAs; meaning-preserving transformations cause only small drops in FVEfinding0.697
Quantitative evaluation showing NLAs do not heavily rely on covert encoding beyond overt language.
A small group of causally-implicated hidden states encodes LLM truth representations, localized over clause-ending punctuation tokensclaim0.694
Localization result from patching experiments; identifies group (b) hidden states as the locus of truth representations
Cancer as shrinking of the Selfconcept0.691
Cancer is interpreted as cells reverting to unicellular selfishness due to loss of gap junctional coupling, shrinking their cognitive boundary.
If loss keeps going down on the test set, in the limit the model must be learning to interpret and predict all patterns represented in language, including common-sense reasoning, goal-directed optimization, and deployment of the sum of recorded human knowledge.hypothesis0.688
Extrapolation of scaling predictive models to AGI.
In LLaMA-2-7B, PCA of larger_than+smaller_than shows statements clustering by surface-level characteristics (e.g., presence of token 'eighty') rather than truth valuefinding0.686
Shows absence of abstract truth representations in smallest model, supporting scale-dependent emergence claim