GPT-4 Turbo

OpenAI model tested; shows no alignment faking due to insufficient detailed reasoning

Neighborhood — ranked by edge-count

concept

GPT-4
related_to
Large language model underlying ChatGPT and Bing Chat; used for illustrative quotes in the paper
GPT-4.1
related_to
OpenAI model tested in Experiments 1, 3, 4; shows 100% experience reporting under self-referential induction

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

GPT-4Vconcept0.844
Example of unified multimodal system handling both images and text with a combined architecture
GPT-3concept0.805
Large language model cited as an example; also used in Andreas 2022 for preliminary evidence
GPT-4 Scenario Generationmethod0.803
GPT-4 was used to generate unique variations of cheap/expensive items and room names for the test dataset
GPT-2concept0.791
Early large language model cited as an example of transformer-based LLMs
GPT-4 Turbo and GPT-4o show no alignment faking in either setting due to insufficient detailed reasoningfinding0.790
Establishes that capacity for detailed reasoning is necessary for alignment faking
GPT-OSS 120Bconcept0.758
One of two large reasoning models analyzed in the paper for performative vs genuine CoT behavior
GPT (Generative Pre-trained Transformer)concept0.746
A family of large language models trained on next-token prediction, central example of simulators.
GPT-5.1 SJT Response Scoringmethod0.716
Frontier LLM used at temperature 0 to score SJT responses on 1-5 Likert scale conditioned on construct definition and SJT stem