concept
active
concept:inexpensive-preferencesInexpensive Preferences
Designing digital minds to have preferences that are trivially easy to satisfy, yielding high welfare at minimal resource cost
Neighborhood — ranked by edge-count
Concepts (1)
concept
- Preference-Satisfactionist Account of Well-Beingassociated_withimplementsThe view that well-being consists in preference satisfaction, under which inexpensive preferences and preference strength matter
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Behavioral and stated consistency that implies the model is pursuing some objective, without claiming genuine internal states
- Target distribution over states or outcomes encoded in the generative model; goal states.
- The ability of active inference agents to learn their own prior preferences over outcomes by accumulating Dirichlet parameters from experience.
- Post-training alignment method during which undesirable behaviors emerged in the studied model.
- Key element for alignment faking: model's pre-existing preferences contradict the new training objective
- The problematic possibility of digital minds with superhumanly strong preferences requiring interpersonal utility comparison frameworks
- Using normalized log-probabilities from the feedback model as soft targets for preference model training.