method
active
method:sparse-autoencoder-training-on-layer-40-activations

Sparse Autoencoder Training on Layer-40 Activations

SAEs trained on 100M+ tokens to compress token layer-40 activations into 64 active features out of 100K+ for interpretability analysis

Neighborhood — ranked by edge-count

Frameworks (1)

framework
  • Interpretability framework used to decompose layer-40 activations into sparse feature sets for studying emotional alignment and persistence

Concepts (1)

concept

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.