method
active
method:scaling-laws-analysis-for-sae-hyperparametersScaling laws analysis for SAE hyperparameters
Sweeping number of features and training steps to find compute-optimal SAE configurations.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Observation that SAE loss decreases as a power law with compute budget.
- Compute-optimal hyperparameters follow predictable power-law relationships.
- Key methodological contribution claim about architecture-agnostic SAE tuning
- Hypothesis cited in paper suggesting deceptive capabilities may scale with model size
- Empirically observed power law relationship between data scale and model performance; supports convergence hypothesis
- Key limitation that prevents tracing inter-layer dynamics or how steering propagates through model depth
- The objective function combining L2 reconstruction error and L1 penalty scaled by decoder norm, used to train the SAE.
- Claim that feature grounding enables interpretability metrics.