concept
active
concept:preferred-distribution

Preferred Distribution

In active inference, the distribution over goal states; here replaced by the learned self-prior rather than a hand-specified prior

Neighborhood — ranked by edge-count

Frameworks (1)

framework
  • Self-Prior
    extends
    The key novel contribution: an internal model that learns the density of familiar multisensory experiences and drives mark-removal behavior through mismatch with the free energy principle

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • The problematic possibility of digital minds with superhumanly strong preferences requiring interpersonal utility comparison frameworks
  • Probability distribution over discrete states or outcomes.
  • Key element for alignment faking: model's pre-existing preferences contradict the new training objective
  • Behavioral and stated consistency that implies the model is pursuing some objective, without claiming genuine internal states
  • The ability of active inference agents to learn their own prior preferences over outcomes by accumulating Dirichlet parameters from experience.
  • Idea that information is spread across many neurons; superposition is a subtype.
  • Preference Modelframework0.745
    A model trained on comparison data to assign scores to responses, used as reward signal in RLHF/RLAIF.
  • Conjugate prior for categorical variables; used for beliefs about likelihood matrix A.