framework
active
framework:topk-sparse-autoencoders

TopK Sparse Autoencoders

The central mechanistic interpretability tool applied across all three EEG transformers to extract sparse feature dictionaries

Neighborhood — ranked by edge-count

Concepts (4)

concept
  • Entanglement
    implements
    Less hierarchical than embedment; multiple texts work into and out of each other, creating associations across levels and connecting any single text to the matrix of all others.
  • monosemanticity
    implements
    Interpretability property where a latent feature represents a single semantic concept; benchmarked across architectures.
  • The internal representations of EEG transformers from which SAE features are extracted
  • The extracted set of sparse interpretable features from model embeddings via SAEs

Frameworks (1)

framework
  • Interpretability framework used to decompose layer-40 activations into sparse feature sets for studying emotional alignment and persistence

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.