finding
active
finding:single-base64-feature-a-0-45-splits-into-three-distinct-features-in-a-1-letter-specific-digit-specific-and-ascii-encoding-specificSingle base64 feature A/0/45 splits into three distinct features in A/1: letter-specific, digit-specific, and ASCII-encoding-specific
Concrete example of feature splitting revealing unexpected model structure
Source paper
extracted_from(2024) · Marc Carauleanu · Michael Vaiana · Judd Rosenblatt · Cameron Berg +1
Neighborhood — ranked by edge-count
Claims (1)
claim
- Authors argue the absence of a fixed feature count is a property of the superposition geometry, not a failure of the method
Concepts (1)
concept
- Feature splittingsupportsPhenomenon where a feature in a small SAE splits into multiple finer features in a larger SAE.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Shows a general code error detector beyond simple typo detection.
- Universality of base64 feature across two transformers
- Four features (A/0/20, A/0/0, A/0/30, A/0/494) form an FSA-like system implementing HTML tag generationfinding0.762Concrete example of features connecting into FSA-like system implementing complex behavior
- Observed across SAE scales, e.g., 'San Francisco' split into 11 features.
- Universality of DNA feature across two transformer models with different random seeds
- Causal validation of base64 feature function via pinned feature sampling
- Demonstrates prevalence of token-in-context features and feature splitting of common tokens
- Demonstrates mechanistic memorization via feature assemblies in superposition