finding
active
finding:82-of-features-in-1m-sae-had-maximum-pearson-correlation-0-3-with-any-mlp-neuron-and-manual-inspection-showed-no-semantic-resemblance

82% of features in 1M SAE had maximum Pearson correlation ≤0.3 with any MLP neuron, and manual inspection showed no semantic resemblance.

SAE features are not simply mirroring individual neurons.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.