Inexpensive Preferences

Designing digital minds to have preferences that are trivially easy to satisfy, yielding high welfare at minimal resource cost

Neighborhood — ranked by edge-count

concept

Preference-Satisfactionist Account of Well-Being
associated_withimplements
The view that well-being consists in preference satisfaction, under which inexpensive preferences and preference strength matter

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Revealed Preferencesconcept0.770
Behavioral and stated consistency that implies the model is pursuing some objective, without claiming genuine internal states
Prior Preferencesconcept0.759
Target distribution over states or outcomes encoded in the generative model; goal states.
Preference Learningconcept0.756
The ability of active inference agents to learn their own prior preferences over outcomes by accumulating Dirichlet parameters from experience.
Direct Preference Optimizationframework0.751
Post-training alignment method during which undesirable behaviors emerged in the studied model.
Preference Conflictconcept0.747
Key element for alignment faking: model's pre-existing preferences contradict the new training objective
Where Do The Goals Preferences And Attachments Ofquestion0.737
Preference Strengthconcept0.734
The problematic possibility of digital minds with superhumanly strong preferences requiring interpersonal utility comparison frameworks
Soft preference labelsmethod0.733
Using normalized log-probabilities from the feedback model as soft targets for preference model training.