claim
active
claim:sparse-autoencoders-extract-features-that-are-significantly-more-monosemantic-than-neurons-as-shown-by-four-independent-lines-of-evidence

Sparse autoencoders extract features that are significantly more monosemantic than neurons, as shown by four independent lines of evidence

Central claim of the paper, supported by detailed feature analysis, human evaluation, automated interpretability of activations, and automated interpretability of logit weights

Source paper

extracted_from
Towards Safe and Honest AI Agents with Neural Self-Other Overlap
(2024) · Marc Carauleanu · Michael Vaiana · Judd Rosenblatt · Cameron Berg +1

Neighborhood — ranked by edge-count

Findings (7)

finding

Hypotheses (1)

hypothesis

Questions (1)

question

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.