concept
active
concept:feature-splitting

Feature splitting

Phenomenon where a feature in a small SAE splits into multiple finer features in a larger SAE.

Neighborhood — ranked by edge-count

Claims (1)

claim

Methods (1)

method
  • 2D embedding of feature direction vectors used to visualize feature clusters and splitting geometry

Concepts (2)

concept
  • monosemanticity
    associated_with
    Interpretability property where a latent feature represents a single semantic concept; benchmarked across architectures.
  • Single-Token Features
    associated_with
    Features that fire on every instance of a single token; appear in small dictionaries as collapsed versions of many token-in-context features

Findings (1)

finding

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.