finding
active
finding:512-neuron-mlp-continues-to-yield-new-features-as-autoencoder-scales-to-131-072-features-256-expansion512-neuron MLP continues to yield new features as autoencoder scales to 131,072 features (256× expansion)
Shows superposition enables many more features than neurons
Source paper
extracted_from(2024) · Marc Carauleanu · Michael Vaiana · Judd Rosenblatt · Cameron Berg +1
Neighborhood — ranked by edge-count
Frameworks (1)
framework
- Superposition HypothesissupportsCore theoretical framework: neural networks represent more features than neurons by encoding features as directions in superposition
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Shows that loss recovery increases with autoencoder size
- Claim about the sparsity and sufficiency of the identified neuron set
- Measures how much of the MLP layer's function is explained by the learned features
- A sparse set of 28 MLP neurons at layer 18 (~0.2% of MLP) are reused across all cyclic tasksfinding0.769Quantitative finding identifying the specific neurons responsible for generic addition
- Structural finding showing modular organization within the sparse neuron set
- SAE features are not simply mirroring individual neurons.
- Central claim of the paper, supported by detailed feature analysis, human evaluation, automated interpretability of activations, and automated interpretability of logit weights
- Demonstrates that the Arabic feature is not aligned to any single neuron