concept

active

concept:announcing-open-source-saes-for-llama-3-3-70b-and-llama-3-1-8b-balsam-et-al-2025

Announcing Open-Source SAEs for Llama 3.3 70B and Llama 3.1 8B (Balsam et al., 2025)

Goodfire blog post describing SAEs used for Llama models in this study

Neighborhood — ranked by edge-count

paper

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Goodfire SAEs for Llama-3concept0.811
Open-source SAEs from Goodfire used for Llama model experiments, trained on instruction-tuned models
LLaMA3.1-8Bconcept0.809
One of four LLMs selected for representation analysis; embedding dimension D=4096; used as demonstration model in scatter plots.
LLaMA3.1-70Bconcept0.807
One of four LLMs selected; larger model with D=8192 embedding dimension; analyzed across proportionally aligned layers.
LLaMA 3.3 70Bconcept0.786
The model used in Experiment 2 for SAE feature steering experiments via Goodfire API
Llama-3.3-70B-Instructconcept0.769
Primary model of interest showing substantial ESR; largest model tested in the study
Llama-3.1-8B-Instructconcept0.769
Primary qualitative demonstration model and one of 14 LLMs benchmarked
Llama-3.2-3B-Instructconcept0.766
3B Llama model tested; used for injection stride visualization
Llama 3.1 405Bconcept0.765
Large open-weight model showing compliance gap in helpful-only setting