LLaMA3.1-8B

One of four LLMs selected for representation analysis; embedding dimension D=4096; used as demonstration model in scatter plots.

Neighborhood — ranked by edge-count

concept

Llama-3.1-8B-Instruct
related_to
Primary qualitative demonstration model and one of 14 LLMs benchmarked
Meta-Llama-3.1-8B-Instruct
related_to
Backbone model used in E3 geometry analysis.
LLaMA 3.3 70B
related_to
The model used in Experiment 2 for SAE feature steering experiments via Goodfire API
LLaMA3.1-70B
related_to
One of four LLMs selected; larger model with D=8192 embedding dimension; analyzed across proportionally aligned layers.
Llama 3.1 405B
related_to
Large open-weight model showing compliance gap in helpful-only setting

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

LLaMA / LLaMA2 / LLaMA3concept0.842
Language model family used in cross-modal alignment experiments across multiple sizes
Llama-3.2-3B-Instructconcept0.841
3B Llama model tested; used for injection stride visualization
Llama-3.3-70B-Instructconcept0.827
Primary model of interest showing substantial ESR; largest model tested in the study
Announcing Open-Source SAEs for Llama 3.3 70B and Llama 3.1 8B (Balsam et al., 2025)concept0.809
Goodfire blog post describing SAEs used for Llama models in this study
LLaMA-3.1-8B: Sbmax = -1.896 ± 0.211, AUSN = -2.119 ± 0.198, peak layer ℓ* = 10 (median)finding0.804
Seed-pooled geometry-only statistics (per-dev z units).
Llama-3.2-1B-Instructconcept0.801
Smallest Llama model tested; benchmarked across all injection methods
Llama-3.3-70B exhibits internal consistency-checking mechanisms that operate during inferenceclaim0.786
Central interpretive claim of the paper supported by causal ablation and activation evidence
Llama 3.3 70B is the most likely to take on a non-Assistant persona when steered, with even split between human and nonhuman portrayalsfinding0.783
Model-specific difference in persona susceptibility