question
active
question:as-the-subject-model-scales-how-does-the-ideal-expansion-factor-and-required-training-data-for-dictionary-learning-changeas the subject model scales, how does the ideal expansion factor and required training data for dictionary learning change?
Scaling laws for dictionary learning are unknown and needed to assess feasibility on frontier models
Source paper
extracted_from(2024) · Marc Carauleanu · Michael Vaiana · Judd Rosenblatt · Cameron Berg +1
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Motivation for using sparsity-based dictionary learning on language models
- Authors argue features are model properties because logit effects and ablations are consistent with feature interpretations
- what metrics can reliably tell us if dictionary learning has successfully extracted high quality features?question0.802Central methodological gap: current metrics (loss, density histograms, manual inspection) are inadequate
- what is the 'correct number of features' for dictionary learning, and is this question well-posed?question0.794Open question about whether there is a true discrete feature count or a continuous splitting process
- Controls for dataset structure, showing trained model activations have richer structure than data distribution alone
- The model tends to reflect more when the question is difficult, and accuracy is generally lower for harder questionshypothesis0.769Hypothesis explaining negative correlation between reflection rate and accuracy without implying reflection is harmful
- How does different post-training data shift a model's position along persona dimensions?question0.766Future work direction: using persona space to study effects of training data on model character
- Distillation of why learning generalises.