concept
active
concept:towards-monosemanticity-decomposing-language-models-with-dictionary-learning-bricken-et-al-2023

Towards Monosemanticity: Decomposing Language Models with Dictionary Learning (Bricken et al., 2023)

Foundational SAE mechanistic interpretability paper

Neighborhood — ranked by edge-count

Venues (1)

venue

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.