method
active
method:cross-task-generalization-evaluationCross-task generalization evaluation
Measuring AUROC of a probe trained on one task when evaluated on another task to assess universality.
Neighborhood — ranked by edge-count
Concepts (1)
concept
- AUROCimplementsPerformance metric for binary classification; used to evaluate pathogenicity prediction.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The ability to generalize across tasks; lacking in latent methods.
- Whether learned cones transfer effectively across model families (Qwen vs Gemma) and sizes
- Ability to apply learned solutions to novel circumstances.
- Generalization from 2-digit to 3-4 digit arithmetic; limited by mismatch dr.
- Ability to respond appropriately to novel situations based on past regularities; fundamental to learning and intelligence.
- Validation of judge model robustness by regrading 1000 responses with 4 additional judge models
- Evaluation of learned circuits on grids 4x larger with 4x more steps than training conditions
- The ability of probes trained on one dataset to transfer accurately to topically and structurally different datasets