method
active
method:scaled-sae-training-on-claude-3-sonnet-middle-residual-stream-layer

Scaled SAE training on Claude 3 Sonnet middle residual stream layer

Specific application of SAE to extract features from the middle layer of Claude 3 Sonnet, at three scales (1M, 4M, 34M features).

Neighborhood — ranked by edge-count

Methods (1)

method
  • Interpretability method criticized in this paper for shattering manifolds into atomic pieces, obscuring overarching semantic structure.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.