Llama-3.3-70B-Instruct

Primary model of interest showing substantial ESR; largest model tested in the study

Neighborhood — ranked by edge-count

paper

concept

Llama-3.1-8B-Instruct
related_to
Primary qualitative demonstration model and one of 14 LLMs benchmarked
Meta-Llama-3.1-8B-Instruct
related_to
Backbone model used in E3 geometry analysis.
Llama-3.2-3B-Instruct
related_to
3B Llama model tested; used for injection stride visualization
LLaMA 3.3 70B
related_to
The model used in Experiment 2 for SAE feature steering experiments via Goodfire API
Llama-3.2-1B-Instruct
related_to
Smallest Llama model tested; benchmarked across all injection methods
LLaMA3.1-70B
related_to
One of four LLMs selected; larger model with D=8192 embedding dimension; analyzed across proportionally aligned layers.
Goodfire SAEs for Llama-3
associated_with
Open-source SAEs from Goodfire used for Llama model experiments, trained on instruction-tuned models

finding

All five judge models consistently rank Llama-3.3-70B as having substantially higher ESR rates than other models
cites
Cross-judge validation of the primary ESR finding across OpenAI, Alibaba, Anthropic, and Google judge models

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Olmo-3-7B-Instructconcept0.829
7B OLMo model tested; used for layerwise steering visualization (Figure 4)
LLaMA3.1-8Bconcept0.827
One of four LLMs selected for representation analysis; embedding dimension D=4096; used as demonstration model in scatter plots.
Olmo-3.1-32B-Instructconcept0.824
32B OLMo model quantized to 4-bit NF4; tested in OCEAN benchmarks
Llama 3.3 70B is the most likely to take on a non-Assistant persona when steered, with even split between human and nonhuman portrayalsfinding0.815
Model-specific difference in persona susceptibility
Llama 3.1 405Bconcept0.807
Large open-weight model showing compliance gap in helpful-only setting
LLaMA-2-70B and 13B probes generalize better across datasets than 7B probes across all training sets and probe typesfinding0.801
Larger models linearly represent more general concepts including truth
Llama-3.3-70B exhibits internal consistency-checking mechanisms that operate during inferenceclaim0.790
Central interpretive claim of the paper supported by causal ablation and activation evidence
For LLaMA-2-70B, probes trained on larger_than+smaller_than achieve >95% accuracy on sp_en_trans regardless of probing techniquefinding0.788
Striking cross-domain generalization result supporting the claim that larger models represent abstract truth