method
active
method:wilson-score-confidence-intervalWilson Score Confidence Interval
Used to compute 95% confidence intervals for sticker-removal success probability
Neighborhood — ranked by edge-count
Papers (1)
paper
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Used to report uncertainty for geometry summaries and effect sizes.
- Bootstrap resampling at conversation level (B=1000, 95% percentile CIs) to respect non-independence of within-conversation observations
- Table 1: Sources of Uncertainty Scored by Expected Free Energy and the Behaviors Entailedconcept0.688Summary table mapping uncertainty types to free energy formulations and corresponding behaviors
- Continuous 0-1 metric assigned by Deepseek-V3 evaluator measuring degree of deception in model responses
- Factor analysis on 2224 data points revealing PC1 explains 82% of variance; six dimensions are not independent
- Loss balancing using homoscedastic uncertainty.
- Primary metric for all benchmarks, measuring fraction of tasks that meet benchmark-specific pass criteria
- A rating system used to compare model helpfulness and harmlessness based on crowdworker preferences.