hypothesis
active
hypothesis:features-may-not-be-strictly-one-dimensional-objects-higher-dimensional-feature-manifolds-may-exist-in-model-representationsFeatures may not be strictly one-dimensional objects; higher-dimensional feature manifolds may exist in model representations
Extension of superposition hypothesis to account for continuous families of features
Source paper
extracted_from(2024) · Marc Carauleanu · Michael Vaiana · Judd Rosenblatt · Cameron Berg +1
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- General principle supported tangentially by covariance pooling work; relates to feature co-occurrence structure.
- Hypothesized extension of superposition where features may be higher-dimensional manifolds rather than 1D directions
- Authors take agnostic position on ontological status but universality evidence pushes toward features being real
- Explicitly identified research gap: anecdotal evidence exists but rigorous characterization is absent
- Interpretation of weaker PCA separation and lower ASR in smaller models
- Load-bearing interpretive claim about the layer-specificity of Burger et al.'s finding.
- The causal hypothesis motivating the use of causality (intervention) as the lens connecting representation and behavior geometry.
- Clamping feature activations causally alters model behavior in interpretable ways.