artifact
active
artifact:simulators-lesswrong-post

Simulators (LessWrong post)

The paper being extracted.

Neighborhood — ranked by edge-count

Thinkers (17)

thinker
  • The author of the LessWrong post 'Simulators'.
  • Author of LessWrong post deriding GPT-3 evaluations, advocate of ecological evaluation.
  • Authored posts on conditioning generative models.
  • Co-crystallized AI alignment theory.
  • Cited for RL from human preferences (2017) and debates/discussions.
  • Warned of the pitfalls of the agent model.
  • Author of 'The Unreasonable Effectiveness of Recurrent Neural Networks'.
  • Google engineer who claimed LaMDA is sentient, quoted about LaMDA's nature.
  • Wrote the 2019 post 'Implications of GPT-2'.
  • Commented on GPT as roleplayer, wrote about GPT-3, and complained about oracle frame.
  • Authored 'Strategy For Conditioning Generative Models'.
  • Philosopher quoted on simulation.
  • Jozdien
    cites
    Authored 'Conditioning Generative Models for Alignment'.
  • Veedrac
    cites
    Author of 'Optimality is the tiger, and agents are its teeth'.

Frameworks (14)

framework
  • A foundational variational principle from statistical physics that formalizes how self-organizing systems maintain structural integrity and adapt to their environment by minimizing free energy—a mathematical bound on surprise or prediction error. Originally developed by Karl Friston, the framework unifies action, perception, and learning as processes of active inference, where systems both update internal models of the world and act upon it to reduce the divergence between predictions and observations.
  • Neural network architecture based on attention, commonly used in large language models
  • The framework proposed: self-supervised models are simulators that generate simulacra; distinguishes simulator from simulated agents.
  • Bostrom's category of AIs that produce desired results given commands but do not act autonomously.
  • The view of AI as a question-answer system optimized for correctness, often inherited from supervised learning.
  • Bostrom's category of AIs that perform specific tasks without overarching goals.
  • The traditional alignment framework focusing on agents optimized to pursue goals.
  • The approach of learning from demonstrations, often assuming a single agent; Paul Christiano used 'mimicry'.
  • The property of an AI being safe to shut down or modify; discussed in context of GPT.
  • The concept of inner vs outer alignment, referenced multiple times.
  • The thesis that sufficiently advanced agents will converge to similar subgoals.
  • Classical thesis that any level of intelligence can be combined with any goal.
  • A framework for agents that are part of the environment they act on.
  • Shard Theory
    mentions
    A theory about inner misalignment mentioned in footnote.

Methods (8)

method
  • Method for fine-tuning LMs based on human preferences; mentioned as combining RL and LMs.
  • A self-supervised method where generator and discriminator compete; can lead to deceptive simulations.
  • Char-RNN
    mentions
    Recurrent neural networks trained character-by-character for text generation, early precursor.
  • A model that frames RL as sequence modeling, SOTA from random trajectories.
  • Generative models that reverse a noising process, mentioned in quasi-simulator table.
  • N-grams
    mentions
    Statistical model of next-letter probabilities used by Shannon.
  • A technique to filter model outputs; Redwood Research's project mentioned.
  • A method for improving reasoning by self-training on rationales.

Artifacts (15)

artifact

Concepts (13)

concept
  • Proposed universal invariant of cognition and intelligence—capacity for goal-directed activity in a problem space, independent of substrate or embodiment.
  • Simulacra
    introduces
    The phenomena simulated by a simulator, such as agents or processes that appear in text generated by GPT.
  • Using multi-step reasoning by generating intermediate thoughts.
  • The objective of minimizing predictive error on a self-supervised distribution, leading to Bayes-optimal conditional inference.
  • A family of large language models trained on next-token prediction, central example of simulators.
  • A model optimized for prediction can simulate agents with any objectives and any degree of optimality.
  • Myopia
    mentions
    Property of an AI that does not plan far ahead; relevant to GPT's training.
  • Codex
    mentions
    OpenAI's code-generating model, another genie-like use.
  • Predictive accuracy applies pressure directly on actions rather than consequences, avoiding instrumental convergence.
  • nostalgebraist's term for measuring performance when the model is incentivised to perform well.
  • InstructGPT
    mentions
    A version of GPT fine-tuned for instruction following, exemplifying genie modality.
  • The capability of GPT-3 to learn tasks from few-shot prompts during runtime.
  • A concept referenced in footnote regarding GPT's agents not being wrappers.

Venues (1)

venue
  • The platform where the post was published.