Probe-based early-exit

Strategy introduced in the paper to stop generation early based on probe confidence, saving tokens while retaining accuracy.

Neighborhood — ranked by edge-count

paper

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Probe-Guided Early Exitconcept0.918
Using activation probes to terminate CoT generation early when the model's belief is already stable, saving compute
Probesconcept0.747
Interpretability tools that decode information from internal model activations; here, linear probes are used for data attribution.
Probe-guided early exit reduces tokens by up to 80% on MMLU and 30% on GPQA-Diamond with similar accuracyclaim0.746
Practical efficiency claim for using activation probes to enable adaptive computation
Probe-Based Data Attributionmethod0.736
Linear classifier approach applied to model activations to identify which training datapoints caused undesired behaviors in post-training.
Probe-guided early exit reduces tokens by up to 30% on GPQA-Diamond with similar accuracy on DeepSeek-R1 671B and GPT-OSS 120Bfinding0.733
Quantitative efficiency result on hard benchmark, smaller reduction reflecting genuine reasoning need
Probe-based method bridges interpretability (probes/activations) with data-centric alignment workclaim0.729
Assertion from the paper's notes that the work connects two previously separate areas: interpretability tools and data-centric alignment.
Probe scoreconcept0.721
Dot product between hidden state and concept vector averaged across 5-layer window around best layer; measures model's internal emotive state
Probe Generalizationconcept0.709
The ability of probes trained on one dataset to transfer accurately to topically and structurally different datasets