method
active
method:pass-rate-scoring

Pass Rate Scoring

Primary metric for all benchmarks, measuring fraction of tasks that meet benchmark-specific pass criteria

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • The pass rate among a model's skill-loaded trajectories, measuring outcome conditioned on harness activation
  • Probe scoreconcept0.766
    Dot product between hidden state and concept vector averaged across 5-layer window around best layer; measures model's internal emotive state
  • A scoring rule optimized by predicting true probabilities; log-loss is one.
  • Score = (sum of completed quartet values) × (number of quartets), making portfolio composition consequential.
  • safety scoresconcept0.722
    Metrics derived from benchmarks to quantify how safe a model is, e.g., refusal rate to harmful requests.
  • Reflection rateconcept0.722
    Ratio of reflection steps to total reasoning steps, used to quantify reflection behavior
  • Elo scoremethod0.719
    A rating system used to compare model helpfulness and harmlessness based on crowdworker preferences.
  • Learning Rateconcept0.719
    Hyperparameter for optimizing model parameters through learning in active inference.