question

active

question:is-gpt-corrigible

Is GPT corrigible?

Disambiguation exercise.

Source paper

extracted_from

Simulators — LessWrong

Neighborhood — ranked by edge-count

Frameworks (1)

framework

Corrigibility
associated_with
The property of an AI being safe to shut down or modify; discussed in context of GPT.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

GPT is corrigible in a negative sense because the agent specification (prompt) is not fixed by the policy and the policy lacks direct training incentives to control its prompt.claim0.842
GPT's corrigibility explained.
Is GPT delusional?question0.831
Disambiguation exercise.
Can GPT distinguish correlation and causality?question0.812
Disambiguation exercise.
Does GPT have superhuman knowledge?question0.807
Disambiguation exercise.
Is GPT pretending to be stupider than it is?question0.791
Disambiguation exercise.
Is GPT myopic?question0.782
Disambiguation exercise.
Can GPT write its successor?question0.781
Disambiguation exercise.
GPT-4.1concept0.780
OpenAI model tested in Experiments 1, 3, 4; shows 100% experience reporting under self-referential induction