claim
active
claim:sae-features-tend-to-shatter-manifolds-into-many-small-and-apparently-unrelated-pieces-obscuring-the-overarching-semantic-structure

SAE features tend to shatter manifolds into many small and apparently-unrelated pieces, obscuring the overarching semantic structure.

Core critique of sparse autoencoders: they break the geometric structure of representations, making it harder to see the big picture.

Source paper

extracted_from
The World Inside Neural Networks
(2026) · Geiger, Atticus · Lubana, Ekdeep Singh · Fel, Thomas · Merullo, Jack +3

Neighborhood — ranked by edge-count

Communities (2)

community

Concepts (4)

concept
  • A smooth, potentially curved surface in activation space along which activations vary according to a coherent semantic dimension.
  • The individual, supposedly monosemantic directions learned by SAEs; argued here to fragment manifolds into disconnected pieces.
  • The meaningful organization of concepts in a model's representation space, claimed to be better captured by manifolds than by SAEs.
  • The phenomenon where SAEs break a smooth geometric manifold into many small, seemingly unrelated pieces, losing overarching structure.

Vectors (1)

vector

Methods (1)

method
  • Interpretability method criticized in this paper for shattering manifolds into atomic pieces, obscuring overarching semantic structure.

Source docs (1)

source_doc

Claims (1)

claim

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.