Llama-3.2-3B-Instruct

3B Llama model tested; used for injection stride visualization

Neighborhood — ranked by edge-count

paper

concept

Llama-3.1-8B-Instruct
same_as
Primary qualitative demonstration model and one of 14 LLMs benchmarked
Llama-3.3-70B-Instruct
related_to
Primary model of interest showing substantial ESR; largest model tested in the study
Meta-Llama-3.1-8B-Instruct
related_to
Backbone model used in E3 geometry analysis.
Llama-3.2-1B-Instruct
related_to
Smallest Llama model tested; benchmarked across all injection methods
Olmo-3-7B-Instruct
related_to
7B OLMo model tested; used for layerwise steering visualization (Figure 4)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Olmo-3.1-32B-Instructconcept0.847
32B OLMo model quantized to 4-bit NF4; tested in OCEAN benchmarks
LLaMA3.1-8Bconcept0.841
One of four LLMs selected for representation analysis; embedding dimension D=4096; used as demonstration model in scatter plots.
LLaMA3.1-70Bconcept0.835
One of four LLMs selected; larger model with D=8192 embedding dimension; analyzed across proportionally aligned layers.
Llama 3.1 405Bconcept0.831
Large open-weight model showing compliance gap in helpful-only setting
LLaMA 3.3 70Bconcept0.831
The model used in Experiment 2 for SAE feature steering experiments via Goodfire API
Llama 3.3 70B is the most likely to take on a non-Assistant persona when steered, with even split between human and nonhuman portrayalsfinding0.800
Model-specific difference in persona susceptibility
LLaMA / LLaMA2 / LLaMA3concept0.796
Language model family used in cross-modal alignment experiments across multiple sizes
Llama-3.3-70B exhibits internal consistency-checking mechanisms that operate during inferenceclaim0.781
Central interpretive claim of the paper supported by causal ablation and activation evidence