Goodfire SAEs for Llama-3

Open-source SAEs from Goodfire used for Llama model experiments, trained on instruction-tuned models

Neighborhood — ranked by edge-count

concept

Llama-3.3-70B-Instruct
associated_with
Primary model of interest showing substantial ESR; largest model tested in the study

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Announcing Open-Source SAEs for Llama 3.3 70B and Llama 3.1 8B (Balsam et al., 2025)concept0.811
Goodfire blog post describing SAEs used for Llama models in this study
LLaMA3.1-8Bconcept0.763
One of four LLMs selected for representation analysis; embedding dimension D=4096; used as demonstration model in scatter plots.
Llama-3.2-3B-Instructconcept0.761
3B Llama model tested; used for injection stride visualization
LLaMA3.1-70Bconcept0.761
One of four LLMs selected; larger model with D=8192 embedding dimension; analyzed across proportionally aligned layers.
LLaMA 3.3 70Bconcept0.753
The model used in Experiment 2 for SAE feature steering experiments via Goodfire API
Llama 3.1 405Bconcept0.746
Large open-weight model showing compliance gap in helpful-only setting
LLaMA / LLaMA2 / LLaMA3concept0.738
Language model family used in cross-modal alignment experiments across multiple sizes
Llama-3.1-8B-Instructconcept0.735
Primary qualitative demonstration model and one of 14 LLMs benchmarked