question
active
question:will-these-methods-work-for-large-modelswill these methods work for large models?
Motivating question for the paper, addressed by scaling SAEs to Claude 3 Sonnet.
Source paper
extracted_fromNeighborhood — ranked by edge-count
Claims (1)
claim
- Central claim of the paper: the method scales to state-of-the-art transformers.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Forward-looking prediction about scalability of the method to larger models
- Clamping feature activations causally alters model behavior in interpretable ways.
- Practical scalability question addressed in Appendix D.
- Survey of representation engineering methods cited as related work
- Bigger models are more likely to converge to a shared representation than smaller modelshypothesis0.743Selective pressure toward convergence via model capacity
- Implication of PRH for AI fairness and bias
- Core claim distinguishing this paper's contribution from looser representational similarity arguments.
- Paper's assessment of current LLM capabilities relative to Turing Test