Scaling laws analysis for SAE hyperparameters

Sweeping number of features and training steps to find compute-optimal SAE configurations.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Power law scalingconcept0.786
Observation that SAE loss decreases as a power law with compute budget.
Scaling laws can be used to guide the training of sparse autoencoders.claim0.777
Compute-optimal hyperparameters follow predictable power-law relationships.
A single SAE hyperparameter procedure driven by an intrinsic dictionary health audit transfers robustly across all three EEG transformer architectures.claim0.775
Key methodological contribution claim about architecture-agnostic SAE tuning
Inverse Scaling Lawconcept0.771
Hypothesis cited in paper suggesting deceptive capabilities may scale with model size
Power Law Scaling of Data and Model Performanceconcept0.760
Empirically observed power law relationship between data scale and model performance; supports convergence hypothesis
Single-Layer SAE Analysis Limitationconcept0.747
Key limitation that prevents tracing inter-layer dynamics or how steering propagates through model depth
SAE training loss (MSE + L1 penalty with decoder norm scaling)method0.740
The objective function combining L2 reconstruction error and L1 penalty scaled by decoder norm, used to train the SAE.
SAE features can be grounded in clinical taxonomy (abnormality, age, sex, medication) to benchmark monosemanticity and entanglement.claim0.736
Claim that feature grounding enables interpretability metrics.