method
active
method:grid-scaling-generalization-test

Grid Scaling Generalization Test

Evaluation of learned circuits on grids 4x larger with 4x more steps than training conditions

Neighborhood — ranked by edge-count

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • The ability to generalize across tasks; lacking in latent methods.
  • Test-Time Scalingconcept0.756
    Approach using extra compute at test time to double-check answers and improve reliability.
  • Scaling aggregated gradient by the maximum gradient norm among tasks.
  • Measuring AUROC of a probe trained on one task when evaluated on another task to assess universality.
  • Generalizationconcept0.750
    Ability to apply learned solutions to novel circumstances.
  • Generalisationconcept0.747
    Ability to respond appropriately to novel situations based on past regularities; fundamental to learning and intelligence.
  • The ability of probes trained on one dataset to transfer accurately to topically and structurally different datasets
  • The capacity of a probe trained on one true/false dataset to accurately classify statements from topically and structurally different datasets