method
active
method:judge-model-scoringJudge Model Scoring
Claude 4.5 Haiku used to segment responses into attempts and score each attempt 0-100 for relevance
Neighborhood — ranked by edge-count
Methods (1)
method
- Three-step protocol: (1) object-level prompting, (2) SAE-latent steering, (3) judge model scoring of attempts
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Using Claude Sonnet 4 as a grader to categorize model responses according to predefined criteria.
- Comparing models using log-evidence approximated by free energy.
- Dot product between hidden state and concept vector averaged across 5-layer window around best layer; measures model's internal emotive state
- Probability of data under the model, penalizing complexity and rewarding accuracy.
- A representation that captures relevant aspects of a system; according to the theorem, the regulator must embody this.
- Inferring opponents' wealth, holdings, and strategies to inform decisions.
- Factor analysis on 2224 data points revealing PC1 explains 82% of variance; six dimensions are not independent