method
active
method:multiple-choice-evaluation-method-for-pm-trainingMultiple-choice evaluation method for PM training
Using language model log probabilities of answer choices (A)/(B) to produce preference labels.
Neighborhood — ranked by edge-count
Frameworks (1)
framework
- Reinforcement Learning Constitutional AIimplementsThe RL stage of CAI using AI feedback to train a preference model, then RL, resulting in a policy trained by RLAIF.
Concepts (1)
concept
- Chain-of-Thought ReasoningimplementsMedium through which eval awareness is often verbalized; target of intervention.
Methods (1)
method
- few-shot promptingimplementsProviding k labeled examples in the prompt to steer model behavior.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Hybrid method combining Personality Prompting (P2) with MDS injections; best overall steering method
- A response containing multiple distinct attempts to answer the prompt, used as primary metric for ESR
- Generative model substrate for active inference; discrete states, actions, outcomes, and temporal policies.
- Gradient balancing by solving multi-objective optimization for minimum-norm aggregated gradient.
- Secondary metric: percentage of responses containing multiple attempts, separating surface from actual self-correction
- Modeling framework for discrete state-space decision-making under uncertainty, used as generative model in active inference.
- Alternative approach noted but dismissed as computationally intractable for the rule-learning problem
- Reinforcement learning methods that update parameters at the end of an episode based on sampled returns.