concept
active
concept:llama-3-3-70b-instructLlama-3.3-70B-Instruct
Primary model of interest showing substantial ESR; largest model tested in the study
Neighborhood — ranked by edge-count
Papers (1)
paper
Concepts (7)
concept
- Llama-3.1-8B-Instructrelated_toPrimary qualitative demonstration model and one of 14 LLMs benchmarked
- Meta-Llama-3.1-8B-Instructrelated_toBackbone model used in E3 geometry analysis.
- Llama-3.2-3B-Instructrelated_to3B Llama model tested; used for injection stride visualization
- LLaMA 3.3 70Brelated_toThe model used in Experiment 2 for SAE feature steering experiments via Goodfire API
- Llama-3.2-1B-Instructrelated_toSmallest Llama model tested; benchmarked across all injection methods
- LLaMA3.1-70Brelated_toOne of four LLMs selected; larger model with D=8192 embedding dimension; analyzed across proportionally aligned layers.
- Goodfire SAEs for Llama-3associated_withOpen-source SAEs from Goodfire used for Llama model experiments, trained on instruction-tuned models
Findings (1)
finding
- Cross-judge validation of the primary ESR finding across OpenAI, Alibaba, Anthropic, and Google judge models
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- 7B OLMo model tested; used for layerwise steering visualization (Figure 4)
- One of four LLMs selected for representation analysis; embedding dimension D=4096; used as demonstration model in scatter plots.
- 32B OLMo model quantized to 4-bit NF4; tested in OCEAN benchmarks
- Model-specific difference in persona susceptibility
- Large open-weight model showing compliance gap in helpful-only setting
- Larger models linearly represent more general concepts including truth
- Llama-3.3-70B exhibits internal consistency-checking mechanisms that operate during inferenceclaim0.790Central interpretive claim of the paper supported by causal ablation and activation evidence
- Striking cross-domain generalization result supporting the claim that larger models represent abstract truth