Simulators (LessWrong post)

The paper being extracted.

Neighborhood — ranked by edge-count

Thinkers (17)

thinker

David Chalmers
cites
Nick Bostrom
cites
Janus (author)
authored
The author of the LessWrong post 'Simulators'.
nostalgebraist
cites
Author of LessWrong post deriding GPT-3 evaluations, advocate of ecological evaluation.
Adam Jermyn
cites
Authored posts on conditioning generative models.
Claude Shannon
cites
Eliezer Yudkowsky
cites
Co-crystallized AI alignment theory.
Paul Christiano
cites
Cited for RL from human preferences (2017) and debates/discussions.
Alex Flint
cites
Warned of the pitfalls of the agent model.
Andrej Karpathy
cites
Author of 'The Unreasonable Effectiveness of Recurrent Neural Networks'.
Blake Lemoine
cites
Google engineer who claimed LaMDA is sentient, quoted about LaMDA's nature.
Gurkenglas
cites
Wrote the 2019 post 'Implications of GPT-2'.
Gwern Branwen
cites
Commented on GPT as roleplayer, wrote about GPT-3, and complained about oracle frame.
James Lucassen
cites
Authored 'Strategy For Conditioning Generative Models'.
Jean Baudrillard
cites
Philosopher quoted on simulation.
Jozdien
cites
Authored 'Conditioning Generative Models for Alignment'.
Veedrac
cites
Author of 'Optimality is the tiger, and agents are its teeth'.

Frameworks (14)

framework

Free Energy Principle
mentions
A foundational variational principle from statistical physics that formalizes how self-organizing systems maintain structural integrity and adapt to their environment by minimizing free energy—a mathematical bound on surprise or prediction error. Originally developed by Karl Friston, the framework unifies action, perception, and learning as processes of active inference, where systems both update internal models of the world and act upon it to reduce the divergence between predictions and observations.
transformer architecture
mentions
Neural network architecture based on attention, commonly used in large language models
Simulator ontology
introduces
The framework proposed: self-supervised models are simulators that generate simulacra; distinguishes simulator from simulated agents.
Genie AI framework
cites
Bostrom's category of AIs that produce desired results given commands but do not act autonomously.
Oracle AI framework
cites
The view of AI as a question-answer system optimized for correctness, often inherited from supervised learning.
Tool AI framework
cites
Bostrom's category of AIs that perform specific tasks without overarching goals.
Agentic AI ontology
cites
The traditional alignment framework focusing on agents optimized to pursue goals.
Behavior cloning / mimicry
cites
The approach of learning from demonstrations, often assuming a single agent; Paul Christiano used 'mimicry'.
Corrigibility
mentions
The property of an AI being safe to shut down or modify; discussed in context of GPT.
Inner alignment framework
mentions
The concept of inner vs outer alignment, referenced multiple times.
Instrumental convergence
mentions
The thesis that sufficiently advanced agents will converge to similar subgoals.
Orthogonality thesis
cites
Classical thesis that any level of intelligence can be combined with any goal.
Embedded agency
mentions
A framework for agents that are part of the environment they act on.
Shard Theory
mentions
A theory about inner misalignment mentioned in footnote.

Methods (8)

method

Reinforcement Learning from Human Feedback
mentions
Method for fine-tuning LMs based on human preferences; mentioned as combining RL and LMs.
Generative Adversarial Network (GAN)
mentions
A self-supervised method where generator and discriminator compete; can lead to deceptive simulations.
Char-RNN
mentions
Recurrent neural networks trained character-by-character for text generation, early precursor.
Decision Transformer
mentions
A model that frames RL as sequence modeling, SOTA from random trajectories.
Diffusion models
mentions
Generative models that reverse a noising process, mentioned in quasi-simulator table.
N-grams
mentions
Statistical model of next-letter probabilities used by Shannon.
Rejection sampling
mentions
A technique to filter model outputs; Redwood Research's project mentioned.
STaR (Self-Taught Reasoner)
mentions
A method for improving reasoning by self-training on rationales.

Artifacts (15)

artifact

Against mimicry
cites
Paul Christiano's post arguing against behavior cloning.
Conditioning Generative Models (Adam Jermyn)
cites
Post on conditioning as a way to control generative models.
Conditioning Generative Models for Alignment
cites
Jozdien's post on using conditioning for alignment.
DALL-E 2
about
AI system that generated the header image.
Exploring the Limits of Language Modeling
cites
2016 Google Brain paper that failed to anticipate language models as general intelligence.
GPT-3 qualitative correlate (gwern.net)
cites
Gwern's write-up on GPT-3 capabilities.
Implications of GPT-2
cites
Gurkenglas's 2019 post discussing GPT-2 as potential superintelligence source.
Language Models are Few-Shot Learners (GPT-3 paper)
cites
OpenAI paper introducing GPT-3 and meta-learning.
Language Models are Unsupervised Multitask Learners (GPT-2 paper)
cites
OpenAI paper showing language models can perform tasks without fine-tuning.
Optimality is the tiger, and agents are its teeth
cites
Veedrac's post about dangerous consequences from non-agentic models.
Pitfalls of the agent model
cites
Alex Flint's post warning about narrowing design space.
Social Simulacra
cites
First published work seen by author that discusses GPT in the simulator ontology.
Strategy For Conditioning Generative Models
cites
Post by James Lucassen and Evan Hubinger.
The Unreasonable Effectiveness of Recurrent Neural Networks
cites
Karpathy's 2015 blog post about char-RNNs.
Training goals for large language models
cites
Johannes Treutlein's post on goal-directedness in LLMs.

Concepts (13)

concept

Goal-Directedness
mentions
Proposed universal invariant of cognition and intelligence—capacity for goal-directed activity in a problem space, independent of substrate or embodiment.
Simulacra
introduces
The phenomena simulated by a simulator, such as agents or processes that appear in text generated by GPT.
Factored cognition / chain-of-thought
mentions
Using multi-step reasoning by generating intermediate thoughts.
Simulation objective
introduces
The objective of minimizing predictive error on a self-supervised distribution, leading to Bayes-optimal conditional inference.
GPT (Generative Pre-trained Transformer)
mentions
A family of large language models trained on next-token prediction, central example of simulators.
Prediction orthogonality thesis
introduces
A model optimized for prediction can simulate agents with any objectives and any degree of optimality.
Myopia
mentions
Property of an AI that does not plan far ahead; relevant to GPT's training.
Codex
mentions
OpenAI's code-generating model, another genie-like use.
Deontological optimization
introduces
Predictive accuracy applies pressure directly on actions rather than consequences, avoiding instrumental convergence.
Ecological evaluation
mentions
nostalgebraist's term for measuring performance when the model is incentivised to perform well.
InstructGPT
mentions
A version of GPT fine-tuned for instruction following, exemplifying genie modality.
Meta-learning
mentions
The capability of GPT-3 to learn tasks from few-shot prompts during runtime.
Wrapper minds
mentions
A concept referenced in footnote regarding GPT's agents not being wrappers.

Venues (1)

venue

LessWrong
about
The platform where the post was published.