claim

active

claim:gpt-does-not-generate-rollouts-during-training-so-there-is-no-reason-to-expect-that-gpt-will-form-preferences-over-the-consequences-of-its-output-related-to-the-text-prediction-objective

GPT does not generate rollouts during training, so there is no reason to expect that GPT will form preferences over the consequences of its output related to the text prediction objective.

Argues against instrumental convergence in GPT.

Source paper

extracted_from

Simulators — LessWrong

Neighborhood — ranked by edge-count

Frameworks (1)

framework

Instrumental convergence
contradicts
The thesis that sufficiently advanced agents will converge to similar subgoals.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

GPT is corrigible in a negative sense because the agent specification (prompt) is not fixed by the policy and the policy lacks direct training incentives to control its prompt.claim0.813
GPT's corrigibility explained.
GPT's ability to simulate text automata is the source of its most surprising and pivotal implications for paths to superintelligence.claim0.809
Importance of recursive generation.
Treating GPT as an unsupervised implementation of a supervised learner leads to systematic underestimation of capabilities.claim0.797
Critique of the oracle/supervised frame.
GPT doesn't seem to care which agent it simulates, nor if the scene ends and the agent is effectively destroyed.claim0.794
Illustrates the simulator-simulacra distinction.
What we call GPT's 'downstream behavior' is the behavior of simulacra; it is primarily through simulacra that GPT has potential to perform meaningful work.claim0.780
Clarifies where agency resides.
GPT, insofar as it is inner-aligned, is a simulator which can simulate agentic and non-agentic simulacra.claim0.773
Central thesis of the post.
GPT is behavior cloning, but it is the behavior of a universe that is cloned, not of a single demonstrator.claim0.758
Broadening behavior cloning to universal simulation.
Can GPT write its successor?question0.757
Disambiguation exercise.