Probe Generalization

The ability of probes trained on one dataset to transfer accurately to topically and structurally different datasets

Neighborhood — ranked by edge-count

paper

concept

Out-of-Distribution Probe Generalization
related_to
The capacity of a probe trained on one true/false dataset to accurately classify statements from topically and structurally different datasets

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Generalizationconcept0.834
Ability to apply learned solutions to novel circumstances.
task generalizationconcept0.830
The ability to generalize across tasks; lacking in latent methods.
Generalisationconcept0.825
Ability to respond appropriately to novel situations based on past regularities; fundamental to learning and intelligence.
scope generalizationconcept0.820
Generalization from 2-digit to 3-4 digit arithmetic; limited by mismatch dr.
generalization (abstraction)concept0.810
Abstracting from specific memories (e.g., specific leaves) to general lessons (food).
Probesconcept0.792
Interpretability tools that decode information from internal model activations; here, linear probes are used for data attribution.
Simple difference-in-mean probes generalize as well as other probing techniques while identifying directions which are more causally implicated in model outputsclaim0.781
Key methodological claim: MM probes are both competitive in accuracy and superior in causal influence
Probe-Based Data Attributionmethod0.769
Linear classifier approach applied to model activations to identify which training datapoints caused undesired behaviors in post-training.