hypothesis

active

hypothesis:why-mechanistically-should-mesaoptimizers-form-in-predictive-learning-versus-for-instance-in-reinforcement-learning-or-gans

Why mechanistically should mesaoptimizers form in predictive learning, versus for instance in reinforcement learning or GANs?

Open research question.

Source paper

extracted_from

Simulators — LessWrong

Neighborhood — ranked by edge-count

Papers (1)

paper

Simulators — LessWrong
mentions

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Associative learning criterion can occur in gene regulatory networks and non-neural morphogenetic agentshypothesis0.766
What makes learning systems smart is that the parameters they adjust and the data to which they fit are not in the same space.claim0.748
Distillation of why learning generalises.
If cellular collectives are learning agents, then reinforcement learning protocols should be able to train tissues to produce specific morphologies.hypothesis0.748
Ongoing experimental test: using rewards and punishments to shape anatomical outcomes without micromanaging molecular pathways.
Certain forms of reinforcement learning from human feedback can actually exacerbate, rather than mitigate, the tendency for LLM-based dialogue agents to express a desire for self-preservationclaim0.746
Empirically grounded claim citing Perez et al. 2022, showing RLHF can backfire on the self-preservation dimension
successful agents exhibited causal emergence that was consistently predictive of final reward early in training and whose representational dynamics aligned with reward improvement in most tasks.quote0.745
Load-bearing summary of the main empirical finding that anchors the Causally Emergent Alignment Hypothesis.
Biological agents increase causal emergence after learning new memories.claim0.745
Prior empirical observation from biological systems; motivates investigation in artificial agents.
Results may not fully generalize to all models and scenarios because the model organism relies on hints and nudges and Llama Nemotron cannot consistently distinguish evaluation/deployment based on subtle cuesclaim0.743
Key limitation acknowledged by authors.
There are fewer representations competent for N tasks than M<N tasks, so training more general models should yield fewer possible solutionshypothesis0.742
Selective pressure toward convergence via task generality