concept
active
concept:goodfire-saes-for-llama-3Goodfire SAEs for Llama-3
Open-source SAEs from Goodfire used for Llama model experiments, trained on instruction-tuned models
Neighborhood — ranked by edge-count
Concepts (1)
concept
- Llama-3.3-70B-Instructassociated_withPrimary model of interest showing substantial ESR; largest model tested in the study
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Goodfire blog post describing SAEs used for Llama models in this study
- One of four LLMs selected for representation analysis; embedding dimension D=4096; used as demonstration model in scatter plots.
- 3B Llama model tested; used for injection stride visualization
- One of four LLMs selected; larger model with D=8192 embedding dimension; analyzed across proportionally aligned layers.
- The model used in Experiment 2 for SAE feature steering experiments via Goodfire API
- Large open-weight model showing compliance gap in helpful-only setting
- Language model family used in cross-modal alignment experiments across multiple sizes
- Primary qualitative demonstration model and one of 14 LLMs benchmarked