Multiple-choice evaluation method for PM training

Using language model log probabilities of answer choices (A)/(B) to produce preference labels.

Neighborhood — ranked by edge-count

framework

Reinforcement Learning Constitutional AI
implements
The RL stage of CAI using AI feedback to train a preference model, then RL, resulting in a policy trained by RLAIF.

concept

Chain-of-Thought Reasoning
implements
Medium through which eval awareness is often verbalized; target of intervention.

method

few-shot prompting
implements
Providing k labeled examples in the prompt to steer model behavior.

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

PM Hybrid Methodmethod0.755
Hybrid method combining Personality Prompting (P2) with MDS injections; best overall steering method
Multi-Attempt Responseconcept0.741
A response containing multiple distinct attempts to answer the prompt, used as primary metric for ESR
Markov Decision Process (MDP)framework0.736
Generative model substrate for active inference; discrete states, actions, outcomes, and temporal policies.
Multiple Gradient Descent Algorithm (MGDA)method0.732
Gradient balancing by solving multi-objective optimization for minimum-norm aggregated gradient.
Multi-Attempt Rate (metric)concept0.726
Secondary metric: percentage of responses containing multiple attempts, separating surface from actual self-correction
Partially Observable Markov Decision Process (POMDP)framework0.717
Modeling framework for discrete state-space decision-making under uncertainty, used as generative model in active inference.
Partially Observed Markov Decision Processframework0.716
Alternative approach noted but dismissed as computationally intractable for the rule-learning problem
Monte-Carlo reinforcement learningmethod0.714
Reinforcement learning methods that update parameters at the end of an episode based on sampled returns.