artifact
active
artifact:simulators-lesswrong-postSimulators (LessWrong post)
The paper being extracted.
Neighborhood — ranked by edge-count
Thinkers (17)
thinker
- David Chalmerscites
- Nick Bostromcites
- Janus (author)authoredThe author of the LessWrong post 'Simulators'.
- nostalgebraistcitesAuthor of LessWrong post deriding GPT-3 evaluations, advocate of ecological evaluation.
- Adam JermyncitesAuthored posts on conditioning generative models.
- Claude Shannoncites
- Eliezer YudkowskycitesCo-crystallized AI alignment theory.
- Paul ChristianocitesCited for RL from human preferences (2017) and debates/discussions.
- Alex FlintcitesWarned of the pitfalls of the agent model.
- Andrej KarpathycitesAuthor of 'The Unreasonable Effectiveness of Recurrent Neural Networks'.
- Blake LemoinecitesGoogle engineer who claimed LaMDA is sentient, quoted about LaMDA's nature.
- GurkenglascitesWrote the 2019 post 'Implications of GPT-2'.
- Gwern BranwencitesCommented on GPT as roleplayer, wrote about GPT-3, and complained about oracle frame.
- James LucassencitesAuthored 'Strategy For Conditioning Generative Models'.
- Jean BaudrillardcitesPhilosopher quoted on simulation.
- JozdiencitesAuthored 'Conditioning Generative Models for Alignment'.
- VeedraccitesAuthor of 'Optimality is the tiger, and agents are its teeth'.
Frameworks (14)
framework
- Free Energy PrinciplementionsA foundational variational principle from statistical physics that formalizes how self-organizing systems maintain structural integrity and adapt to their environment by minimizing free energy—a mathematical bound on surprise or prediction error. Originally developed by Karl Friston, the framework unifies action, perception, and learning as processes of active inference, where systems both update internal models of the world and act upon it to reduce the divergence between predictions and observations.
- transformer architecturementionsNeural network architecture based on attention, commonly used in large language models
- Simulator ontologyintroducesThe framework proposed: self-supervised models are simulators that generate simulacra; distinguishes simulator from simulated agents.
- Genie AI frameworkcitesBostrom's category of AIs that produce desired results given commands but do not act autonomously.
- Oracle AI frameworkcitesThe view of AI as a question-answer system optimized for correctness, often inherited from supervised learning.
- Tool AI frameworkcitesBostrom's category of AIs that perform specific tasks without overarching goals.
- Agentic AI ontologycitesThe traditional alignment framework focusing on agents optimized to pursue goals.
- The approach of learning from demonstrations, often assuming a single agent; Paul Christiano used 'mimicry'.
- CorrigibilitymentionsThe property of an AI being safe to shut down or modify; discussed in context of GPT.
- Inner alignment frameworkmentionsThe concept of inner vs outer alignment, referenced multiple times.
- Instrumental convergencementionsThe thesis that sufficiently advanced agents will converge to similar subgoals.
- Orthogonality thesiscitesClassical thesis that any level of intelligence can be combined with any goal.
- Embedded agencymentionsA framework for agents that are part of the environment they act on.
- Shard TheorymentionsA theory about inner misalignment mentioned in footnote.
Methods (8)
method
- Method for fine-tuning LMs based on human preferences; mentioned as combining RL and LMs.
- A self-supervised method where generator and discriminator compete; can lead to deceptive simulations.
- Char-RNNmentionsRecurrent neural networks trained character-by-character for text generation, early precursor.
- Decision TransformermentionsA model that frames RL as sequence modeling, SOTA from random trajectories.
- Diffusion modelsmentionsGenerative models that reverse a noising process, mentioned in quasi-simulator table.
- N-gramsmentionsStatistical model of next-letter probabilities used by Shannon.
- Rejection samplingmentionsA technique to filter model outputs; Redwood Research's project mentioned.
- STaR (Self-Taught Reasoner)mentionsA method for improving reasoning by self-training on rationales.
Artifacts (15)
artifact
- Against mimicrycitesPaul Christiano's post arguing against behavior cloning.
- Post on conditioning as a way to control generative models.
- Jozdien's post on using conditioning for alignment.
- DALL-E 2aboutAI system that generated the header image.
- 2016 Google Brain paper that failed to anticipate language models as general intelligence.
- Gwern's write-up on GPT-3 capabilities.
- Gurkenglas's 2019 post discussing GPT-2 as potential superintelligence source.
- OpenAI paper introducing GPT-3 and meta-learning.
- OpenAI paper showing language models can perform tasks without fine-tuning.
- Veedrac's post about dangerous consequences from non-agentic models.
- Alex Flint's post warning about narrowing design space.
- Social SimulacracitesFirst published work seen by author that discusses GPT in the simulator ontology.
- Post by James Lucassen and Evan Hubinger.
- Karpathy's 2015 blog post about char-RNNs.
- Johannes Treutlein's post on goal-directedness in LLMs.
Concepts (13)
concept
- Goal-DirectednessmentionsProposed universal invariant of cognition and intelligence—capacity for goal-directed activity in a problem space, independent of substrate or embodiment.
- SimulacraintroducesThe phenomena simulated by a simulator, such as agents or processes that appear in text generated by GPT.
- Using multi-step reasoning by generating intermediate thoughts.
- Simulation objectiveintroducesThe objective of minimizing predictive error on a self-supervised distribution, leading to Bayes-optimal conditional inference.
- A family of large language models trained on next-token prediction, central example of simulators.
- Prediction orthogonality thesisintroducesA model optimized for prediction can simulate agents with any objectives and any degree of optimality.
- MyopiamentionsProperty of an AI that does not plan far ahead; relevant to GPT's training.
- CodexmentionsOpenAI's code-generating model, another genie-like use.
- Deontological optimizationintroducesPredictive accuracy applies pressure directly on actions rather than consequences, avoiding instrumental convergence.
- Ecological evaluationmentionsnostalgebraist's term for measuring performance when the model is incentivised to perform well.
- InstructGPTmentionsA version of GPT fine-tuned for instruction following, exemplifying genie modality.
- Meta-learningmentionsThe capability of GPT-3 to learn tasks from few-shot prompts during runtime.
- Wrapper mindsmentionsA concept referenced in footnote regarding GPT's agents not being wrappers.
Venues (1)
venue
- LessWrongaboutThe platform where the post was published.