Preferred Distribution

In active inference, the distribution over goal states; here replaced by the learned self-prior rather than a hand-specified prior

Neighborhood — ranked by edge-count

framework

Self-Prior
extends
The key novel contribution: an internal model that learns the density of familiar multisensory experiences and drives mark-removal behavior through mismatch with the free energy principle

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Preference Strengthconcept0.776
The problematic possibility of digital minds with superhumanly strong preferences requiring interpersonal utility comparison frameworks
Categorical Distributionconcept0.772
Probability distribution over discrete states or outcomes.
Preference Conflictconcept0.765
Key element for alignment faking: model's pre-existing preferences contradict the new training objective
Revealed Preferencesconcept0.754
Behavioral and stated consistency that implies the model is pursuing some objective, without claiming genuine internal states
Preference Learningconcept0.753
The ability of active inference agents to learn their own prior preferences over outcomes by accumulating Dirichlet parameters from experience.
Distributed representationconcept0.753
Idea that information is spread across many neurons; superposition is a subtype.
Preference Modelframework0.745
A model trained on comparison data to assign scores to responses, used as reward signal in RLHF/RLAIF.
Dirichlet Distributionconcept0.745
Conjugate prior for categorical variables; used for beliefs about likelihood matrix A.