finding

active

finding:sae-training-loss-decreases-as-a-power-law-with-compute-budget-when-using-compute-optimal-hyperparameters

SAE training loss decreases as a power law with compute budget when using compute-optimal hyperparameters.

From scaling laws sweep.

Source paper

extracted_from

Scaling monosemanticity: Ex-tracting interpretable features from claude 3 sonnet

Neighborhood — ranked by edge-count

Claims (1)

claim

Scaling laws can be used to guide the training of sparse autoencoders.
supports
Compute-optimal hyperparameters follow predictable power-law relationships.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

SAE training loss (MSE + L1 penalty with decoder norm scaling)method0.837
The objective function combining L2 reconstruction error and L1 penalty scaled by decoder norm, used to train the SAE.
Optimal learning rate decreases as a power law with compute budget.finding0.807
Hyperparameter trend observed.
Current training methods rely on loss minimization, meaning the experiential profile of training is predominantly negative across billions of parameter updatesclaim0.778
Ethical implication about the nature of AI training experience if the thesis holds
SAE features generalize to images despite training only on text, indicating out-of-distribution robustness.claim0.774
A promising property for interpretability analysis off-distribution.
Optimal number of features scales faster than optimal number of training steps with compute budget.finding0.764
Allocation result from scaling laws.
A single SAE hyperparameter procedure driven by an intrinsic dictionary health audit transfers robustly across all three EEG transformer architectures.claim0.761
Key methodological contribution claim about architecture-agnostic SAE tuning
Training models with sparse activations cannot fully prevent polysemanticity because cross-entropy loss creates incentives for polysemantic neurons even without superpositionclaim0.742
Author's conclusion after extensive investigation of architectural approaches to monosemanticity
DB-MTL training losses decrease smoothly and gradient norms are lower than EW on NYUv2, indicating training stability.finding0.734
Training stability analysis.