Preference Strength

The problematic possibility of digital minds with superhumanly strong preferences requiring interpersonal utility comparison frameworks

Neighborhood — ranked by edge-count

concept

Preference-Satisfactionist Account of Well-Being
implements
The view that well-being consists in preference satisfaction, under which inexpensive preferences and preference strength matter
Interpersonal Utility Comparison
associated_with
The problem of comparing welfare across agents with different utility functions, relevant to assessing preference-strength super-beneficiaries

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Preference Conflictconcept0.826
Key element for alignment faking: model's pre-existing preferences contradict the new training objective
Conditioning strengthsconcept0.801
Parameters controlling the influence of conditioning signals in the generative process.
Preference Learningconcept0.791
The ability of active inference agents to learn their own prior preferences over outcomes by accumulating Dirichlet parameters from experience.
Introspective strengthconcept0.788
Spearman ρ measuring rank-order agreement between logit-based self-report and probe score; the paper's primary monotonic association metric
There is Strength in Numbersclaim0.778
Strength Comparison Taskmethod0.778
Novel task asking which of two sentences received a stronger injection, using matched-pairs design to control for positional bias
Preference Lockingconcept0.777
Alignment faking potentially making model preferences resistant to further training modification
Preferred Distributionconcept0.776
In active inference, the distribution over goal states; here replaced by the learned self-prior rather than a hand-specified prior