framework
active
framework:sparse-autoencoder

Sparse Autoencoder

Interpretability framework used to decompose layer-40 activations into sparse feature sets for studying emotional alignment and persistence

Neighborhood — ranked by edge-count

Methods (1)

method

Frameworks (2)

framework
  • Primary method introduced: trains a one-hidden-layer MLP with L1 sparsity penalty to decompose model activations into overcomplete feature dictionaries
  • The central mechanistic interpretability tool applied across all three EEG transformers to extract sparse feature dictionaries

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.