Out-of-Distribution Probe Generalization

The capacity of a probe trained on one true/false dataset to accurately classify statements from topically and structurally different datasets

Neighborhood — ranked by edge-count

concept

Out-of-Distribution (OOD) Generalization
related_to
Machine learning generalization when training and test distributions differ; linked to causal invariance.
Probe Generalization
related_to
The ability of probes trained on one dataset to transfer accurately to topically and structurally different datasets

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

EI and normalized EI could serve as a unified metric for out-of-distribution generalization.claim0.782
Conjecture that maximizing EI yields causal representations invariant to distribution shifts.
Simple difference-in-mean probes generalize as well as other probing techniques while identifying directions which are more causally implicated in model outputsclaim0.773
Key methodological claim: MM probes are both competitive in accuracy and superior in causal influence
Generalizationconcept0.765
Ability to apply learned solutions to novel circumstances.
Generalisationconcept0.762
Ability to respond appropriately to novel situations based on past regularities; fundamental to learning and intelligence.
CoT improves in-distribution but may harm out-of-distribution generalizationclaim0.758
Interpretation of scope generalization results
scope generalizationconcept0.752
Generalization from 2-digit to 3-4 digit arithmetic; limited by mismatch dr.
Probe-Based Data Attributionmethod0.749
Linear classifier approach applied to model activations to identify which training datapoints caused undesired behaviors in post-training.
task generalizationconcept0.746
The ability to generalize across tasks; lacking in latent methods.