paper
referenced-only
paper:scaling

Scaling monosemanticity: Ex-tracting interpretable features from claude 3 sonnet

Methods (13)

Findings (31)

Claims (20)

Hypotheses (2)

Questions (9)

Related work— refs + corpus + external arXiv

Cited / in-corpus / arXiv badges show which signals surfaced each row. Multi-source rows weighted higher.

Similar preprints — Semantic Scholar

Cited by (2)