claim
active
claim:the-isotropic-superposition-model-is-incomplete-because-features-cluster-into-higher-density-groups-due-to-correlated-activations-and-similar-downstream-actionsThe isotropic superposition model is incomplete because features cluster into higher-density groups due to correlated activations and similar downstream actions
Authors revise their own prior Toy Models framework based on evidence from feature splitting and geometry
Source paper
extracted_from(2024) · Marc Carauleanu · Michael Vaiana · Judd Rosenblatt · Cameron Berg +1
Neighborhood — ranked by edge-count
Frameworks (1)
framework
- Isotropic Superposition ModelcontradictsPrior model of superposition where features are discrete 1D objects repelling each other roughly evenly; paper argues this is incomplete
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Superposition hypothesis: neural networks represent more features than dimensions using almost-orthogonal directions.hypothesis0.764Explanation for why dictionary learning can recover many more features than dimensions.
- Interpretation of the cars-in-superposition circuit finding as an intentional representational strategy
- Authors' overall conclusion from number of interpretable features, activation-level correspondence to intensity, sensible logit weights, and interference weights
- Core theoretical framework: neural networks represent more features than neurons by encoding features as directions in superposition
- Visual geometric evidence for the fundamental entanglement of true/false activations in harder tasks.
- Statistically rigorous analysis of Claude introspection; suggests models may have latent introspective capabilities that can be enhanced or disrupted.
- Explains limitation of current ecological connectionist models.
- Extension of superposition hypothesis to attention layers as future research direction