concept
active
concept:probe-generalizationProbe Generalization
The ability of probes trained on one dataset to transfer accurately to topically and structurally different datasets
Neighborhood — ranked by edge-count
Papers (1)
paper
Concepts (1)
concept
- Out-of-Distribution Probe Generalizationrelated_toThe capacity of a probe trained on one true/false dataset to accurately classify statements from topically and structurally different datasets
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Ability to apply learned solutions to novel circumstances.
- The ability to generalize across tasks; lacking in latent methods.
- Ability to respond appropriately to novel situations based on past regularities; fundamental to learning and intelligence.
- Generalization from 2-digit to 3-4 digit arithmetic; limited by mismatch dr.
- Abstracting from specific memories (e.g., specific leaves) to general lessons (food).
- Interpretability tools that decode information from internal model activations; here, linear probes are used for data attribution.
- Key methodological claim: MM probes are both competitive in accuracy and superior in causal influence
- Linear classifier approach applied to model activations to identify which training datapoints caused undesired behaviors in post-training.