concept
active
concept:feature-splittingFeature splitting
Phenomenon where a feature in a small SAE splits into multiple finer features in a larger SAE.
Neighborhood — ranked by edge-count
Papers (1)
paper
Claims (1)
claim
- Assertion about the qualitative advantages of VPD's rank-one decomposition.
Methods (1)
method
- UMAP Embedding of Featuressupports2D embedding of feature direction vectors used to visualize feature clusters and splitting geometry
Concepts (2)
concept
- monosemanticityassociated_withInterpretability property where a latent feature represents a single semantic concept; benchmarked across architectures.
- Single-Token Featuresassociated_withFeatures that fire on every instance of a single token; appear in small dictionaries as collapsed versions of many token-in-context features
Findings (1)
finding
- Concrete example of feature splitting revealing unexpected model structure
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Authors argue the absence of a fixed feature count is a property of the superposition geometry, not a failure of the method
- Observed across SAE scales, e.g., 'San Francisco' split into 11 features.
- Property that features activate on only a small fraction of inputs; enables compressed sensing and is what allows superposition to work
- The subdivision of properties to create smaller, individually owned lots that support unique buildings and increased density.
- Property of features that form consistently across different models trained on the same or similar data, suggesting features are real representational units
- Domain of techniques for constructing informative features from raw data; covariance pooling is a feature engineering method for token sequences.
- Metaphor treating each system feature or function as a separate application that can be independently loaded and managed.
- Method of optimizing input to cause a neuron to fire maximally, used to characterize what a neuron detects; establishes causal link