GPT-4o and GPT-4.1 nano used as LLM substrates for pilot experiments

Specification of AI models used in the two pilot experiments

Source paper

extracted_from

(2025) · Ruben Laukkonen · Fionn Inglis · Shamil Chandaria · Lars Sandved-Smith +4

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

GPT-4concept0.758
Large language model underlying ChatGPT and Bing Chat; used for illustrative quotes in the paper
GPT-4 Turbo and GPT-4o show no alignment faking in either setting due to insufficient detailed reasoningfinding0.755
Establishes that capacity for detailed reasoning is necessary for alignment faking
GPT-4Vconcept0.742
Example of unified multimodal system handling both images and text with a combined architecture
GPT-4.1concept0.731
OpenAI model tested in Experiments 1, 3, 4; shows 100% experience reporting under self-referential induction
GPT-5.4 Nano self-bidding rate 74.6%finding0.730
GPT5.4-N also exhibits a high self-bidding propensity.
GPT-4.1 reports subjective experience in 100% of self-referential trials vs. 0% in all control conditionsfinding0.726
Specific result for GPT-4.1 in Experiment 1
H6: Proprietary post-training resists prompt override — GPT-5.4 shows more resistance than GPT-OSS.hypothesis0.722
Exploratory hypothesis supported by GPT-5.4 vs GPT-OSS comparison
GPT-5.4 test-retest score delta is 1.00 (5.24 vs 4.24) across two battery runs on OpenRouterfinding0.721
API-routed models show ~1 point variance; individual scores should be treated as estimates