method
active
method:grid-scaling-generalization-testGrid Scaling Generalization Test
Evaluation of learned circuits on grids 4x larger with 4x more steps than training conditions
Neighborhood — ranked by edge-count
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The ability to generalize across tasks; lacking in latent methods.
- Approach using extra compute at test time to double-check answers and improve reliability.
- Scaling aggregated gradient by the maximum gradient norm among tasks.
- Measuring AUROC of a probe trained on one task when evaluated on another task to assess universality.
- Ability to apply learned solutions to novel circumstances.
- Ability to respond appropriately to novel situations based on past regularities; fundamental to learning and intelligence.
- The ability of probes trained on one dataset to transfer accurately to topically and structurally different datasets
- The capacity of a probe trained on one true/false dataset to accurately classify statements from topically and structurally different datasets