claim

active

claim:optimizing-toward-the-simulation-objective-does-not-incentivize-instrumentally-convergent-behaviors-the-way-that-reward-functions-which-evaluate-trajectories-do

Optimizing toward the simulation objective does not incentivize instrumentally convergent behaviors the way that reward functions which evaluate trajectories do.

Deontological nature of predictive loss.

Source paper

extracted_from

Simulators — LessWrong

Neighborhood — ranked by edge-count

Concepts (1)

concept

Simulation objective
supports
The objective of minimizing predictive error on a self-supervised distribution, leading to Bayes-optimal conditional inference.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

The strict version of the simulation objective is optimized by the actual time evolution rule that created the training samples.claim0.804
Equivalence of optimal predictor to the physics of the data.
A model whose objective is prediction can simulate agents who optimize toward any objectives, with any degree of optimality (bounded above but not below by the model's power).claim0.775
Prediction orthogonality thesis.
The outer objective of self-supervised learning is Bayes-optimal conditional inference, which I call the simulation objective.claim0.770
Definition of simulation objective.
Acting to optimize value and perception are two aspects of exactly the same principle: minimization of free energy.claim0.767
Foundational claim unifying action and perception within single optimization framework.
acting to optimize value and perception are two aspects of exactly the same principle; namely the minimisation of a quantity [free energy] that bounds the probability of sensory input, given a particular agent or phenotype.quote0.766
Concise statement of the free-energy principle's unification of action and perception.
"any goal or purpose can be well thought of as maximization of the expected value of the cumulative sum of a received scalar signal (reward)"quote0.766
The reward hypothesis underpinning RL, quoted from Sutton and Barto.
Optimizing interventions in activation space to produce paths along M_y recovers activation trajectories that trace the curvature of M_h.finding0.758
Demonstrates bidirectional causal link: behavior manifold geometry can be recovered by optimizing in representation space.
Special-purpose intelligences optimized for narrow tasks might not converge to the platonic representationclaim0.753
Counterexample/limitation: only general-purpose models are subject to the convergence pressures described