claim
active
claim:scaling-laws-can-be-used-to-guide-the-training-of-sparse-autoencodersScaling laws can be used to guide the training of sparse autoencoders.
Compute-optimal hyperparameters follow predictable power-law relationships.
Source paper
extracted_fromNeighborhood — ranked by edge-count
Findings (2)
finding
- Optimal number of features scales faster than optimal number of training steps with compute budget.supportsAllocation result from scaling laws.
- From scaling laws sweep.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Central claim of the paper: the method scales to state-of-the-art transformers.
- Empirical principle discovered during autoencoder training; led to using 8 billion training points
- Sparse Autoencoders Find Highly Interpretable Features in Language Models (Cunningham et al., 2023)concept0.800Core methodology paper for SAE-based interpretable feature extraction
- Rationale for using simpler sparse autoencoders rather than NP-hard compressed sensing algorithms
- Forward-looking prediction about scalability of the method to larger models
- Hypothesis cited in paper suggesting deceptive capabilities may scale with model size
- Critique of activation-based interpretability methods.
- Sweeping number of features and training steps to find compute-optimal SAE configurations.