method
active
method:pass-rate-scoringPass Rate Scoring
Primary metric for all benchmarks, measuring fraction of tasks that meet benchmark-specific pass criteria
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The pass rate among a model's skill-loaded trajectories, measuring outcome conditioned on harness activation
- Dot product between hidden state and concept vector averaged across 5-layer window around best layer; measures model's internal emotive state
- A scoring rule optimized by predicting true probabilities; log-loss is one.
- Score = (sum of completed quartet values) × (number of quartets), making portfolio composition consequential.
- Metrics derived from benchmarks to quantify how safe a model is, e.g., refusal rate to harmful requests.
- Ratio of reflection steps to total reasoning steps, used to quantify reflection behavior
- A rating system used to compare model helpfulness and harmlessness based on crowdworker preferences.
- Hyperparameter for optimizing model parameters through learning in active inference.