method
active
method:cross-task-generalization-evaluation

Cross-task generalization evaluation

Measuring AUROC of a probe trained on one task when evaluated on another task to assess universality.

Neighborhood — ranked by edge-count

Concepts (1)

concept
  • AUROC
    implements
    Performance metric for binary classification; used to evaluate pathogenicity prediction.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • The ability to generalize across tasks; lacking in latent methods.
  • Whether learned cones transfer effectively across model families (Qwen vs Gemma) and sizes
  • Generalizationconcept0.792
    Ability to apply learned solutions to novel circumstances.
  • Generalization from 2-digit to 3-4 digit arithmetic; limited by mismatch dr.
  • Generalisationconcept0.762
    Ability to respond appropriately to novel situations based on past regularities; fundamental to learning and intelligence.
  • Validation of judge model robustness by regrading 1000 responses with 4 additional judge models
  • Evaluation of learned circuits on grids 4x larger with 4x more steps than training conditions
  • The ability of probes trained on one dataset to transfer accurately to topically and structurally different datasets