method
active
method:gpt-5-1-sjt-response-scoringGPT-5.1 SJT Response Scoring
Frontier LLM used at temperature 0 to score SJT responses on 1-5 Likert scale conditioned on construct definition and SJT stem
Neighborhood — ranked by edge-count
Papers (1)
paper
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- OpenAI model tested in Experiments 1, 3, 4; shows 100% experience reporting under self-referential induction
- Large language model underlying ChatGPT and Bing Chat; used for illustrative quotes in the paper
- GPT-4.1 reports subjective experience in 100% of self-referential trials vs. 0% in all control conditionsfinding0.750Specific result for GPT-4.1 in Experiment 1
- Early large language model cited as an example of transformer-based LLMs
- Example of unified multimodal system handling both images and text with a combined architecture
- GPT-5.4 test-retest score delta is 1.00 (5.24 vs 4.24) across two battery runs on OpenRouterfinding0.743API-routed models show ~1 point variance; individual scores should be treated as estimates
- Similarly poor against code agents.
- Disambiguation exercise.